This is part one in a four-part series that will walk you through everything you need to think about when performance testing a new site.
Before taking the world’s first flight, the Wright brothers built a wind tunnel in which they tested hundreds of different wing designs. I’d argue that building a production website should go through similarly rigorous testing. If you’re launching a production website with a significant amount of anticipated traffic, testing the site in an environment that simulates the dynamic nature of the world in which it will eventually operate is an absolute necessity.
Performance and load testing help us identify points of contention and targets for optimization, and give us a realistic view of what the system can handle in terms of load.
As a general performance metric, the standing rule of thumb is that no page should take longer than 3 seconds to load, with 10,000 concurrent users. There is some wiggle room here, as content loaded dynamically might take slightly longer, but generally the main HTML body and images should load in this timeframe.
Websites tend to be a mixture of pages with simple construction with highly static content and pages with highly dynamic content and lots of binary code.
Simply constructed pages with little or no dynamic content have long cycles in which the page does not change. This is the primary use case for Web Content Management systems (Web CMS). The page changes only when new content is pushed from the author server to the publish servers. That makes such content an excellent target for caching, both locally and externally.
Highly dynamic pages, on the other hand, have a much lower latency, and must be regenerated more frequently. These present a much lower profile for caching, at least at the page level.
This series covers a number of suggested techniques for performance testing systems that have a mixture of the above page types.
Experience suggests that performance testing should be performed at all stages of development. The types of performance testing will change as the system moves through the escalation process towards production, but these tests will inform the business owner and development staff of bottlenecks in performance well in advance of a public debut.
I believe strongly that performance tests should be performed early and often. Performance tests should be an integral part of the development process. Performance tests should be part of the system smoke test to determine whether changes to the code base have adverse effects.
The good news is that stress tests are easily performed with open source tools. There is a learning curve to getting these tools up and running, and the creation of a data test bed is extra work, but system performance testing simply has to be part of any large-scale system rollout. As a general rule I don’t want to be surprised by system performance when we let in our customers.
Testing enables us to accurately predict performance, tells us what milestones will require new infrastructure resources, and enables us to explain how any changes to our system affect performance. To launch a site without this sort of extensive testing is to strap exquisitely designed wings on one’s back and wonder why they aren’t working.
Types of performance tests
We recommend a number of different types of performance testing, which are deployed at different times in the development cycle. In this post, I am going to cover in detail two different types of performance tests, which have some commonalities, but some specific differences. While both sets of tests use sets of virtualized users hitting the server, mimicking predefined use patterns, they will be used to test specific types of system behavior. I have named the tests after the primary metric being measuring:
- Time-to-first-byte performance tests, sometimes-called “Server Response Tests”.
- Time-to-last-byte performance tests, sometimes-called “Render Response Tests”.
Both of these tests measure the time it takes for information to come back from the server. However, first byte testing measures the time it takes to get the first byte of information back from the Application Server, giving us a sense of how long the application server will take to begin to render a page, while last byte testing gives us the total time required to fully render a page.
Testing Commonalities and Setup
Aside from the observed metrics, the two performance testing regimes have almost identical setups and concerns:
- We want to test the systems by simulating concurrent usage. This means we will need to simulate multiple unique requests from separate threads. Both systems will be set up with virtual user sessions.
- We want to simulate several different, unique users on the system. On systems with high degrees of dynamic content, based on user profile, we don’t want to introduce phantom problems in the test by allowing the same user to be active on multiple sessions.
- We want to test valid use scenarios. It makes no sense, logically, to test only one code path, or a set of unrealistic use scenarios. Web metrics of current functionality can be extremely useful here.
- We will want to test a mix of data consumption (browsing) and data creation (add/modify/delete).
The environment for both sets of test would look similar to the following:
Figure 1: Simplified virtual user map
The number of virtual users needs to be scalable to the maximum number of concurrent users required by the business. As the application server becomes more tuned we will start to max out the I/O and CPU on the test client machines, so we need to be able to increase the number of virtual users by adding additional test clients. Being able to manage these servers centrally becomes a concern as the test suite becomes more sophisticated.
The traffic generated by our virtual users should vary along two different dimensions. First, on systems where the primary access is not anonymous, we will want to have a large, defined set of users. The user sample should be large enough that we see little or no overlap of users in the concurrent tests. We don’t want high volumes of the same user performing at the same time. The second dimension would be a randomized set of tasks that match observed user behavior.
It is useful to create/modify users that share a common password. This simplifies the login process.
Web metrics can be useful in describing the tasks users are most likely to perform and what paths they take to perform them.
One useful rule of thumb is that, generally, systems see a 10 to 1 split between users who are browsing data and users that are creating content. Performance tests should reflect this ratio.
Time to first byte Performance Testing
Time to first byte performance tests have two primary goals:
- To gather the timing delta between the response and the request.
- To load test the backend server.
A simplified diagram of an N-tiered architecture would look like the following:
The time delta between the Request and the Response is the metric. The difference in this time measures how long the application server is taking to render the page. On simple pages, I would expect this time delta to be very short, the page is extracted and returned to requester. On more complicated pages I would expect this time to be longer as the logic to create the page becomes more computationally expensive.
As mentioned above, pages that are relatively static would be forward cached in the Dispatcher or the external cache, thus decreasing the amount of time to pull the page.
The important discrimination here is that the test is only measuring the time it takes for any sort of response from the server (the first byte) to be observed. The tests should, however, distinguish between legitimate return values and error statuses. We don’t want to be measuring 404 response times.
A side effect of the first byte test is that because users are not waiting for pages to render, they can issue more requests in a shorter amount of time. This decreases the wait time and exercises the server.
This can be useful if you want to load test the server.
Load testing the server allows the tester to determine where bottlenecks are happening on the system. As the number of concurrent users grows, competition for resources also grows. Frequently we see the application server show signs of stress. This information is useful in the development process.
Load testing can be useful in a number of ways. As above, it is useful in finding points of contention in the application/database server. It is also a useful as part of a regression smoke test. A load test performed after code changes can tell the developers if the new code has increased or decreased the load on the system.
Time to last byte Performance Testing
Time to last byte performance testing has three primary goals:
- To determine the time necessary to present the finished page to the end user.
- To determine if the caching strategy is sufficient to deal with expected load.
- To determine if there are loading bottlenecks in different geographical regions.
In time to first byte testing, the aim is to test the time the backend takes to build the page response. With time to last byte testing, our aim would be to find out how long it takes to completely build the page.
After the initial page load, the browser starts requesting additional assets necessary to render the page. Only after all these “subsequent asset requests” are finished is the page finally, fully finished loading.
In a snapshot of network activity for the load of a page, we see the following results:
The load of the initial page is the first bar in the chart. However we see that there are subsequent calls after the page has been completely loaded. True, for these requests, the assets are being pulled from the browser cache, however other assets will not be, and only when they are acquired will the page be completely loaded.
Some of these assets will be services consumed from other vendors, others will be binary content served up from the DAM or one of the forward caches.
The time spent fully rendering the page is significantly longer then the time to the first byte of the initial request.
What we are expecting to find here are the requests that take significant amounts of time to render. There can be several reasons for this latency:
- The caching strategy is not working. This would indicate that the caching rules are not well constructed and that the page needs to go back to ‘origin’ too frequently for subsequent requests.
- The page is being requested from a client in some ill served geographical location. In this case the external cache rules have not been will constructed.
- The page assets are being cached, but the TTL for the resources is not long enough.
- Assets have not been adequately prepared for production. In this case code assets have not been compressed, either with zip or minified to reduce page lag.
As seen above, there are tools that can help the developer see the time taken to render the page. Tools like YSlow and the network tab in the Chrome browser will give this information on a case-by-case basis. Performance testing adds volume and concurrency to this.
A note of caution is important here. When you are testing systems, it is important that you understand the nature of your caches. In most cases systems will labor on startup because the caches are not “warmed up”. You do want to test the behavior of your system under realistic load, and making sure the caches are warm will make sure you are not testing outlier behavior.
Check out the rest of the series: