Why breaking your very important public digital service will help you sleep at night

You’re responsible for a really important public digital service. How do you know it’ll survive that one day when everyone tries to use it at once? Load test it. To death.

Joe Baker
Convivio

--

Est. 8916kg on the wings of RNZAF №6 Squadron’s Consolidated PBY Catalina Flying Boat at Halavo Bay, Solomon Islands, 1944. Copyright ©️ Jenny Scott, Adelaide Archvist.

So, you’re responsible for running a really important public digital service. The code is robust, the user experience is superb, the service is great. But there’s still a nagging thought that haunts you and keeps you up at night sometimes.

How will it react on that one day when everyone wants to use your service at the same time? The day Ofsted reports come out, say, or the run-up to the tax self-assessment deadline, or the day the findings of a public enquiry are published. Will it stay up? Will it keep servicing people?

To know that, you need to have tested your service for the load you expect it will have to handle.

You need to load test.

Help, I don’t know how resilient my digital service is!

The essential question you need to be able to answer is: How do I know whether my web application will survive a surge in users? In other words, how will it behave when it’s under stress?

The point of asking those questions is to find out the answers to several others so you and your team can then work out whether that’s good enough or whether you need to do something about it.

  • Which part of the system are fragile and likely to strain when they’re put under stress?
  • What is most likely to break down when it is given a heavy load, and why?
  • What will the user’s experience be like when the system is under stress?

You find out the answers by simulating a lot of people using your service simultaneously and watching what happens to site in those conditions. Your observations will indicate where the system is vulnerable and then you can decide how to respond. That’s the essence of load testing.

What is load testing?

30 men on the roof of a DKW “Front Reichsklasse” Type F7, demonstrating the strength of the DKW’s wooden coachwork. Copyright © AUDI AG

Classic forms of testing — unit tests, regression tests and the like — usually test specific functionality and units of code rather than the system as a whole.

Load testing instead looks at your service as a whole system and tests how it reacts when it’s working together under a heavy stress.

Often, of course, load testing will examine a sub-system within the service as a whole, such as a multi-step application form that will experience a heavy load just before the deadline.

A distinction should be made at this point between load testing and stress testing. They’re related, but different. Both are valuable, but understanding the differences will help to understand which approach is most appropriate for you.

Load testing vs stress testing.

Load testing

Load testing looks at a service’s performance under load, usually at the peak load experienced (if you have historical usage data to inform your testing) or the anticipated load (if you’re launching a new service).

Load testing helps you to know how your service will behave and to understand what service users will experience when it is experiencing the peak load.

Stress testing

Stress testing goes beyond load testing. Stress testing keeps adding load until the system breaks, applying unrealistic or unlikely load scenarios.

Stress testing adds load that deliberately induces failures. Those failures allow you understand the breaking points and thereby analyse the risk involved at those weak points. As a consequence, you and your team may then choose to adapt your systems — the codebase, the infrastructure, or other elements of the system — to harden the weaknesses or make them break more gracefully.

Stress testing allows you to prepare for the unexpected.

How should I do load testing?

The important thing with load testing is measuring stuff. You can load test your service just to see whether it withstands the load, and that’s fine. It doesn’t really tell you much, just whether the system did or didn’t survive the load — it’s pretty binary. Other than that yes/no, though, it’s not a very useful way to load test. It’s far more useful to measure stuff.

So when you’re doing your load testing, what things need to be measured? What is significant?

There’s certainly lots of things you can measure, but really there’s only two things that matter from your service user’s perspective:

1) Requests per second

This is just a simple count of the number of requests for things (usually pages) that your service can sustain.

This usually comes from the tool you are using for your load testing: you configure it to deliver a given load in terms of requests per second. For instance, you set it to simulate 5,000 users all signing in to your service over the course of 5 minutes. The signing in process will probably take a series of sequential requests per user. For a web-based service, this may be: load the homepage; go to the login form page; fill in the credentials and submit them; go to the account page; logout again. That’s 4 page requests per user (each web page has other requests as well, of course, for images, styling, scripts, etc). If we naïvely assume, for the purpose of this illustration, that’s evenly spaced over the 5 minutes, that’s: 20,000 page requests in total; 4,000 per minute; or 67 per second.

Your testing tool should track the actual request rate as it progresses through the test and give you a report at the end.

2) Response time

The second important thing to measure is the response time of your service (sometimes called ‘latency’, though the two terms mean slightly different things). It’s all very well handling 67 requests per second, but if each request takes ten seconds for a response your service users won’t be very happy.

You’ll want to decide what response time is appropriate for user satisfaction a request for your service. One second is common; some services may choose more or less according to their context. Multiply that by 4 to find your ‘tolerating’ threshold. Beyond that, users are frustrated. So, for example:

  • 1 sec or less = satisfied;
  • 1–4 sec = tolerating;
  • More than 4 sec = frustrated.

Then, track the average response time for users during the load test to understand the users’ experience when your service is under load. Sometimes, it’s also helpful to examine the response time for 95% of users, and the time for 99%, too.

3) Errors

Inevitably, some of the requests to your service will result in errors. These will happen for various reasons, but you’ll need to measure them — their rate or frequency, say — match them with your system logs to see what may’ve caused the errors and then work what to do to fix them.

Measure your error rate. Find out what causes them. Fix the errors if they’ll cause problems for your service users.

Monitor all the things

What’s important here is to then monitor how your system behaves when it’s put under stress. Monitoring tools inspect the behaviour of the components of your digital system: the processors; memory databases; code; caching tools and more, whatever your decide should be monitored.

There are lots of tools that you can use to scrutinise how each of the elements in your system is functioning, and you should use them all the time — and keep an eye on what they tell you! They will come into their own, though, when you run load tests as they’ll be able to show you which parts of your system are the weak points.

For example, you may discover that it’s the database connections that are slow, or the processing of the data, or memory being exhausted, or the networking between systems, or all manner of other things.

The most useful monitoring tools will be able to look internally inside each process.

Tools like this will tell you which part runs most slowly, or most frequently — a particular database query; a specific function; a repeated query that does the same thing each time and should be cached, not repeated; etc — and reveal where the bottlenecks or weak points are.

In other words:

  • Watch what happens when you add load to your system.
  • Be specific. Look at the elements of the system, not just the big picture, to find which particular things struggle when load is added.
  • Take action with that knowledge and decide to unplug those bottlenecks.

Good, detailed monitoring is the secret sauce for making your system better.

Load testing a complex workflow?

It’s fairly easy to load test a single element of your system. That’s commonly done with the homepage of a website — one request per user; one response; do it lots of times. But what about it your users have to do a sequence of things, like filling in several steps in a multi-page form, each step of which is dependent on a previous step being completed successfully?

That’s harder to do but it’s also, arguably, more important to test: users will take a sustained amount of time using your service; users towards the end of the workflow will be affected by the load from users just beginning, and vice versa; users will be frustrated if problems arise part-way through the workflow.

Load testing here is all-the-more important, then.

  • Test where users have to do things, especially a series of dependent steps: log in; fill in a form; wait for a response; fill in another form; etc.
  • Test where lots of users will be doing the same things at the same time.

How good is good enough?

So, you’ve done your load testing and you’ve made a bunch of improvements to your digital service. And you’ve load tested the improvements, found more weaknesses and fixed those as well. How much more should you do? How good is good enough?

Tricky question.

The only way to answer it is to work out how many people will be using your service at the same time. If you’ve got historical data from previous behaviour — in some digital analytics tools, say — to go from, great. If not, you’ll have to make an estimate from evidence — not just a stab-in-the-dark guess, but a calculated estimate based on good grounds:

  • You may be able to do this because you are replacing a paper process with a digital one.
  • You may be merging several digital services into one unified system and can amalgamate metrics from each.
  • There may be similar public sector digital services with usage data you can use as a guide.
  • You may have good surveys or other user research to use.

However you go about estimating, though, try to find substantive evidence for your estimations.

The first step is to translate that in to a value for requests per second. Knowing you’ll have 50,000 or 200,000 or whatever number of people using your service simultaneously is one thing. Understanding what that means in requests per second is another. If those users will be following a sequence of steps that includes loading pages, submitting forms and more then you’ll need to count up the requests in the sequence, multiply by the number of users and divide by the time taken to follow the sequence.

Next, you’ll need to define a satisfactory response time. Again, a mere stab-in-the-dark value isn’t really useful. If you can define your value from data or other evidence, great. 1 second is common, but varying from that is reasonable if you have evidence to lead you that way. The story goes that the Barclaycard team who developed contactless card payments determined that a 0.5s response time was their target to ensure public acceptance; 0.6s would feel sluggish; 0.7s would feel way too slow. You, too, need to make an evidence-based decision on what latency is acceptable for your service users.

The important step is that you make sure you can comfortably exceed that rate at the required response time. It is not good enough to just get the require response time at your target request rate: your targets may be underestimating the actual usage, especially if they’re not based on historical data, and if improvements in your service lead to higher take-up.

To do this, use some multiplier of your target request rate — 125%, 150% or 200%, say. What you choose is up to you, but make sure you’re comfortable and confident in your rationale.

Got that all done? Great. The next step is the best one.

Sleep well at night, knowing your public digital service is resilient and will withstand the stresses it’ll be put under.

What next?

We’re planning to follow this post with a few others on the same subject.

We’ll be writing soon about how to communicate all this stuff to the stakeholders of your public digital service.

We’re also planning to write about some of the more technical aspects of load testing, with an overview of the tools we use and about building a platform for running load tests automatically as part of a system testing infrastructure.

Photo credits:

Jenny Scott, via Flickr
AUDI AG
Dr. Rohit Choudhary Raj, via Wikimedia Commons
Peter Krimbacher, Moebius1, via Wikimedia Commons

Convivio helps organisations to work better for people.

We do this mainly by helping them transform their services using digital tools, but also by spreading new ways of working.

Read our blog: blog.weareconvivio.com
Follow us on twitter:
@weareconvivio
Get in touch:
hello@weareconvivio.com
Visit our website at
weareconvivio.com

--

--

Writer, PhD in religion and narrative from Bristol University. Chief Research Officer at Convivio, the collaboration company.