Benchmarks are a great way to understand how your WordPress site will perform under different scenarios. Since each site and hosting environment is unique, they can play an important role when choosing a host that suits your needs.
When it comes to hosting, there’s a ton of performance data out there. Even across multiple benchmarks against the same host, numbers can drift as testing methods and periods change. It can quickly become overwhelming for even the most seasoned WordPress professionals.
Of course, all of this data is completely meaningless if you don’t know what you’re looking at. Within the same results comparison, you might see WordPress hosting companies saying that they “won.” In reality, it’s a lot more nuanced than that. By reading and understanding the raw results, you make your own conclusions about what matters most.
In this post, we will look at how WordPress hosting benchmarks work, what to look out for, and which factors can have the most significant impact on your site’s hosting experience.
Sifting For Gold Through Biased Benchmarks
First things first – not all benchmarks are equal. In fact, most are subpar, biased, or miss the mark entirely.
In a perfect world, everyone would run performance tests across multiple hosting platforms, then compare the results independently. But let’s be honest – ain’t nobody got time for that.
If you can’t dedicate countless hours to running your own tests, your next best solution is to figure out which benchmarks you can rely on. It’s going to take lots of digging and comparison to get the best data possible, but it’s worth the effort.
Never Trust A Host’s Own Benchmarks
We all know that just because it’s on the internet doesn’t mean it’s true. The same goes for researching a host’s capabilities. Some may be paid promotions, some may biased affiliates trying to cash in on your clicks, and some may be data provided by the hosts themselves.
Be especially cautious of the any host’s claims (yes, all hosts. Even us). While it’s reasonable to assume they’re being truthful, they might be omitting things or performing their tests under ideal conditions.
You’ve probably seen this in practice elsewhere with products you use, and you should treat the information in the same way. That new phone you bought might boast 48 hours of battery life, but that’s under ideal circumstances – practical use will always vary.
Unbiased, Repeatable, Transparent Benchmarks Performed by Third Parties
The key to finding reliable benchmarks is to locate test results that have been performed by unbiased third parties. You’ll often find complete information in these benchmarks, allowing you to replicate those scenarios with similar results.
By using unbiased sources who provide in-depth information about how the tests were performed, you can use their findings as a bird’s-eye view to narrow down your options. From there, you can verify that their results are accurate, then perform additional testing to see how the results vary against your site’s code.
One resource that does this well in the WordPress space is Review Signal’s WordPress Hosting Performance Benchmarks. Although they’re not perfect (no benchmark tests ever are), their dedication to transparency ensures that the data is valid and can be easily peer-reviewed.
Bonus tip: Comparing your own site against benchmarking results can be super helpful for finding bottlenecks. If a particular type of test differs from the benchmark results, you might have some optimization to do.
Alright, enough of the top-level philosophy. Now that you know what to look for in a quality benchmark, it’s time to get down and dirty in the details.
Before we jump too far into the weeds, let’s take a brief look at the simplest metric to understand: uptime.
Most hosts strive for “five nines,” or 99.999% uptime, as the gold standard. When looking at uptime, as long as a host is close to this, you’re good to go.
When looking at uptime, something worth noting is that it’s not always so easy to test from a third-party perspective. The duration and frequency of these checks are going to impact results. For example, uptime could be tested every five minutes for three months – what happens between those checks? What about older legacy servers that might not have been tested against?
It’s impossible to know precisely what a host’s uptime is from an outside benchmark, but if uptime is critical for your site, we recommend supplementing your research with real-world testimonials, redundancy/failover procedures, and SLAs.
Performance testing is where benchmarking results start to become much more apparent. While uptime statistics show how often a host is physically “up,” performance tests reveal how quickly your site can process dynamic content.
A drastic oversimplification of what happens when a WordPress site loads a page looks something like this:
- The visitor requests a page from the HTTP server.
- An HTTP server tells a PHP process, “Hey, run this script, then give me the HTML back.”
- The PHP thread does its work until it requires something from the database. Once it does, it will ask the database what it needs, pausing until it responds.
- The PHP thread is going to go back and forth for a while between processing code and waiting for the database until it’s complete. Once done, it hands the result off to the HTTP server.
- The HTTP server takes the final result and passes it back to the site visitor.
This all happens within the blink of an eye. So fast, in fact, that any one of these steps is difficult to measure. Benchmarking is done by taking these normal things to an extreme for measurable results.
These results can vary dramatically based on the tests being run, but your site’s overall performance is going to depend on several factors that we’ll go over next.
PHP performance benchmarks are often done by performing CPU-intensive tasks a set number of times, then measuring how long they take to complete. Since a single task is almost always quick to finish, running it once will be too fast to measure. But when run a few thousand times, results are compounded and easier to analyze.
Unless the host is heavily over-provisioned, meaning there are more sites on the server than the hardware can handle, raw PHP benchmarks can be relatively similar when comparing benchmarks across hosts. If you’re comparing 100 different WordPress hosts, you likely won’t notice any practical difference between #1 and #25 based on PHP benchmarks alone.
Although raw PHP performance results are often quite close, it’s important to know that these results are purely based on a single process. As we’ll discuss later, these results become much more tangible when load tests are performed to simulate multiple users.
Database performance, however, is frequently the Achilles heel of WordPress sites. Remember what we mentioned earlier – PHP worker threads are blocked while they wait for the database to complete a query and respond with results. If your database is slow, you’re going to have a bad time.
Like PHP benchmarking, you probably won’t see a huge difference in practical application when comparing between the top performers, but look for outliers. If a particular host benchmarks significantly slower than the others, consider it a warning sign that you might want to avoid them. As long as it’s within the reasonable range of others, they’re a solid contender.
Other Performance Factors
Although PHP and database performance are the primary factors to keep in mind from a hardware perspective, other factors may still come into play, such as:
- Network performance – how quickly information can be sent to and from the remote server.
- Gateway/HTTP server performance – how quickly the HTTP server can relay the request back and forth between your WordPress site and your user.
- Disk performance – how quickly scripts, styles, images, and other files can be read, written, and served.
Load Testing Metrics
Load testing is where the real difference between hosts becomes apparent. While raw PHP and database tests provide some insights, load testing provides transparency into how the site as a whole will perform.
Think of load testing as testing a race car on a track. Rather than measuring the performance of any particular part individually, the whole car is put to the test to see how it’ll perform as a whole.
Load testing results can get a little confusing at times, so let’s break it down into what these different metrics mean and how they apply to a practical scenario.
Requests and Errors
This is the number of total requests that were able to complete during the test, across all virtual users.
The total number of requests sent is largely based on the other factors like average response time. During a load test, each virtual user is performing a set of actions – when one job finishes, the next begins. Thus, the faster the request is handled, the more requests occur.
Additionally, requests during load tests occur simultaneously as well, resulting in longer response times than if a single user were performing the same action repeatedly. Although each virtual user is only making one request at a time, the pool of users is working at the same time to complete their jobs.
Under heavy load, servers will either take longer to complete requests or begin to throw errors. The error rate simply shows how many errors occurred. A few errors are no big deal, and depending on when they occur, can even be beneficial (such as throwing an immediate error instead of loading for 30 seconds). However, you’ll want to be sure that errors occur for the right reasons, since intermittent errors without cause can be a sign of instability.
Average Response Time
The average response time is exactly like it sounds – the average time it takes for the server to respond to a request.
While factors like PHP and database benchmarks are contributing factors that contribute to these numbers, response times during a load test provide a more complete picture of what real traffic would look like.
P95 times are similar to an average response time, but instead of an average, focuses on the top 95% of responses. Essentially, 95% of the requests completed faster than this number.
P95 is one of the best numbers to look at. Average response times can get a little weird – sometimes, an abnormality happens when you’re sending hundreds of thousands of requests to a server. P95 metrics help to filter out those abnormalities and provide a fairly accurate account of what a worst-case scenario looks like.
VUs (Virtual Users)
When load tests are performed, they are done by queuing up virtual users, often displayed as “VUs.” Each of these virtual users has their tasks to perform, configured by the test suite. For example, a VU may have the task of accessing the home page, clicking on a product, adding the product to the cart, then proceeding to checkout. The goal of these tasks is to simulate a real user interaction.
During these load tests, the number of virtual users can vary – sometimes they’re static, sometimes they ramp up and down. When evaluating hosting benchmarks, pay close attention to the number of VUs – even if the host has good benchmarks in other scores, if they can’t handle as many simultaneous users, you’re much more likely to run into scalability issues as you grow.
Anyone can host a WordPress site – usually reasonably fast, too. But the big differentiator is what happens when the traffic starts rolling in.
That said, if you just need a little site that displays a bit of information about your local brick and mortar store, you’re probably not going to have a lot of simultaneous traffic. In that case, get the most out of what you can, and balance your priorities across the other metrics that matter most to you.
Not All Request Types Are the Same
Remember what we said earlier about transparent, reproducible tests? This is where test transparency becomes especially important.
This is because different types of pages have different hurdles. If a page is being loaded from cache, the server is often serving it up from memory. It’s going to be fast anywhere. In contrast, an un-cached archive post, shopping cart, or search results page could be painfully slow in that same environment.
Thanks to modern CDNs, most hosts should see similar performance results across static asset tests. Any variations between hosts are likely minor. A few milliseconds will be imperceptible and so insignificant that it’s even likely to fluctuate that same amount across multiple test runs on the same host.
That said, if you notice any outliers, either run to the hills or prepare to use a third-party CDN to serve any of your static assets. Furthermore, if static content is especially slow, it could be a sign that there are further issues beyond what you’ve uncovered.
Serving a page from a server-side cache is largely the same as serving up static content, but a little slower. More performant hosts will usually serve up cached pages within a few milliseconds of each other, making any differences negligible.
Tip: While not a dealbreaker for most, something worth noting is that some hosts leverage full-page edge caching. This allows cached pages to be served from an edge node that’s closest to the site visitor, making them much faster than without. We won’t go too far into this right now, but if you’re serving a lot of international traffic, edge caching is worth exploring.
When reviewing cached content performance results, as long as a host is within the same general range as others, you’re probably fine. Any major outliers can be crossed off your list entirely. If cached content performs poorly, dynamic content is likely to be much worse.
Dynamic content is where you’re going to start seeing a lot more variation in quality. Many hosts boast about speed, but those claims are often only for cached content. As soon as you start hitting them with pages that can’t be easily cached, such as WooCommerce stores, they start to choke.
It’s a well-known fact that numerous hosts, shared hosting in particular, will over-provision their servers. As they add more customers than their hardware can handle, everyone’s experience slows to a crawl. The site might be perfectly fine when serving pages from cache, since those pages take almost zero resources to handle, but as any real processing is required, everything becomes painfully slow.
Remember what we mentioned earlier – WordPress site performance typically comes down to PHP and database benchmarks. The better these scores are, the better your dynamic pages are likely to perform.
Something important to note is that these scores can vary between single tests and load tests. Just because you’re getting good results when testing the page, doesn’t mean that it’ll still perform the same way under load. In fact, it’s guaranteed to be drastically different when reaching load capacity.
Test, Test, Test
Investing in quality WordPress hosting is a lot like buying a car – you can do all the research in the world, but until you hop in for a test drive, you can’t be too sure about it.
We encourage you to run your own tests. Not only just when shopping around, but regularly. Understanding how your WordPress site and host perform together is crucial to identifying potential pitfalls before they happen. Performance can change for any number of reasons – having the data in place can help you stay ahead of it.
Want More? Have Thoughts to Share? Let Us Know
We know that we couldn’t have possibly covered everything across the wide array of benchmarking metrics. We’d love to hear your thoughts. If you’re interested in continuing the conversation, reach out on social media.
To get notified of new posts, including a few upcoming articles about benchmarking, be sure to subscribe to our newsletter using the form on this page or by visiting the link above. Don’t worry – we’re not interested in spamming you. Notifying you of new content is good enough for us.