Developing cloud applications introduces new challenges and complexities compared to developing on-premises software, but it also brings a lot of possibility. Deploying to the cloud mitigates some of the challenges of DevOps that exist with on-premises or even self-hosted SaaS products because of the strong tooling that exists for automating operations. In some cases, fully realizing those possibilities requires us to reconsider long-standing ideas about development best practices.
In this blog, we'll take a look at testing in production, how it can help us deliver higher quality software faster, and how we can handle the challenges that come with it.
Testing in production is an uncomfortable idea for a lot of people, and that's a perfectly understandable position. We want production systems to behave as intended, but we test things to find out whether or not they do. Testing in production sounds like putting something into production while its behavior is still unknown—exactly what we don't want! The truth is that we need to re-examine our ideas about why we test, what we expect from a production system, and even what the term “production” actually means.
Why do we test software?
This one seems fairly obvious. We test software because we want to know if it does what we intended it to. However, it's important to acknowledge that testing does not guarantee that software is perfect. Catching issues in testing tends to make them quicker and easier to fix than finding them after they're delivered, not to mention avoiding the negative impact they might have had on users. The actual reason we test is that catching more issues earlier can help us deliver higher quality software faster.
What is production?
One way to think about production is any deployment of software that's intended for real use—as opposed to a deployment whose only purpose is testing. We have higher expectations about correctness, availability, performance, etc. from a production deployment than from a testing deployment because the traffic and data it handles are actually meaningful.
Another way to think about production is as the traffic itself rather than the deployment. Our expectations about correctness, availability, performance, etc. are actually expectations about how the traffic will be handled rather than about the deployment as a whole. Thinking about it this way allows us to ask the question, can one deployment handle production and non-production traffic at the same time?
In fact, defining production this way is something we often already do for third-party systems. When we write an application that integrates with the Google Maps API, we don’t deploy a copy of Google into a staging environment; the only real change is thinking about our own systems as integrations between distinct components, too.
Why should we test in production?
The reason testing in production sounds scary is that, in production, we want certainty, and in testing, we're acknowledging uncertainty. The problem with this train of thought is that we can never be certain, and assuming that pre-production testing guarantees there will be no issues tends to leave us unprepared for the issues that will invariably arise.
What we're really aiming to do is minimize the impact of issues in production. One obvious way to reduce impact is to avoid issues altogether (that's why we test), but we should also reduce the impact of the issues that do occur by adjusting the way we handle them. Once we're doing both, we need to consider how much of each one we should do.
We already know that we can't prevent every issue, but we should also consider that there are diminishing returns. The first 99% of issues might be easy to catch, the next 0.9% are harder, and the 0.09% after that are harder still. Catching 99% of the issues instead of 99.9% seems like a bad trade, but if we redirect the saved effort into mitigating the impacts, the overall outcome might be better. Would you rather have one hour-long outage every month, or two minutes of slightly increased latency every day?
Replicating production is very hard
The efficacy of testing depends on the environment's ability to reflect the behavior of production: the less a test environment resembles production, the less effective the test will be.
Common reasons a test environment will differ from production are the cost of operation or the complexity of the deployment. A test environment might run with fewer replicas, less RAM or CPU, a single-instance local database instead of remote redundant servers, etc. Some categories of test won’t be affected by these differences; other times, we predict and account for them. We may even use more production-like environments to run a small subset of the tests, but sometimes they're just going to be missed.
A more complex production environment requires a more complex testing environment to provide meaningful feedback, and the bigger the difference is between environments, the more challenging it becomes to understand and account for the differences.
Differences in the way we build and deploy applications for the cloud add a lot of complexity to the production environment. For example, microservice architectures are a practice meant to promote Agile Software Development by enabling small teams to work more autonomously on smaller domains compared to one large team working on a monolith, but it comes with its own complexities, too, because each microservice will have to replicate efforts that would only be done once in a monolith.
If you're shipping an on-premises application that integrates with Active Directory, talks to a database, and supports thousands of concurrent users, it's going to be fairly easy to create a production-like environment for testing. Since the production-like environment is easy to replicate, and delivering upgrades and installers has an inherent cost, the effort to create the environment is well justified. On the other hand, imagine you're shipping a multi-tenant SaaS application that integrates with multiple IDPs and is made up of dozens of microservices that each use their own databases. Bringing the whole system into lock step for testing before release would slow delivery down to a halt—and the complexity of the system all but guarantees something will be missed.
Fixing production is less hard
We've already seen that a combination of prevention and treatment can provide a better result than prevention alone, and now we've now seen how cloud applications can complicate prevention. We should absolutely still attempt to avoid deploying bad software, but we also need to embrace the fact prevention alone isn't good enough and start worrying more about how we can mitigate the impact when there is an issue in production.
One approach to creating resiliency is using a strategy called progressive delivery, which essentially means introducing a change gradually instead of all at once.
With the progressive delivery model, a new feature is made available to only a subset of customers—this could include those who have opted into experimental features, or who belong to a geographical region, or perhaps only internal users for testing purposes. From a technical perspective, this is achieved by putting a new version of the software into production alongside the existing version and directing traffic to one or the other based on whatever rules you've decided. Over time, more and more traffic is directed toward the new version until it's handling everything. If at any point the results are undesirable, then all traffic can just be directed back to the stable version.
This would be very difficult to accomplish with on-premises software, but it's facilitated in a SaaS context because the vendor controls the deployment. Deploying with Kubernetes makes it even easier because technologies like Istio do most of the work for us.
What are the benefits of the progressive delivery approach?
Besides side-stepping the burden of creating complex testing environments, progressive delivery provides a number of benefits over testing in isolated environments:
- We reduce the difference between testing and production to near-zero, which gives us higher quality results.
- Monitoring new versions as they're rolled out allows us to automate promotion and rollback of new versions based on testing and metrics.
- Our tests automatically reflect real users, which fills in gaps we might have in our test cases.
- Gaining confidence in the resilience of production allows us to iterate faster instead.
The last point deserves some more attention. It's typical to experience a conflict between velocity and confidence. Testing and releasing lots of features together produces fewer releases and slows down the average delivery time, but more frequent releases mean shorter test cycles. The crux of the problem is that shipping half the features doesn't mean we only need do half of the testing (regression tests are an acknowledgement that we can't trust ourselves to do that). Larger systems take disproportionately longer to test. We want to iterate quickly, but our own processes sometimes incentivize and encourage slowing down.
Testing in production helps us address the conflict at its core by reducing the amount of pre-production testing that's required to get a high-quality result instead of trying to squeeze more and more testing in before each release.
Simple, scary, and transformational
The idea of testing before releasing software makes perfect sense, but how you get the most value out of testing depends on factors such as an application’s resilience and the effort it takes to deliver updates. Cloud technologies and practices are game-changing when it comes to providing high availability, continuous delivery, and cost-effective scaling; but in order to fully realize the benefits they have to offer, we have to embrace the significance of their impact and be willing to step outside of our comfort zones.
Learn more about BeyondTrust here, or click here to find out how you can join our team.

Thomas Showers, Senior Software Engineer
Thomas Showers is a Senior Software Engineer on Identity Security Insights. Originally joining BeyondTrust as our first intern, Thomas is a boomerang that has worked with or on the Core, Endpoint Privilege Management Unix & Linux (PMUL), and BeyondInsight teams. He is a self-professed math geek, an accomplished guitarist, and a friend to the animals.