Because there are better ways of releasing software than “move fast and break things.” That might work in the fast and loose start-up days, but once you’ve found your market, your users expect a certain level of reliability. Continuous deployment speeds up developers but bakes in safety and sustainability.
Teams practicing continuous deployment reap all manner of benefits:
On top of the system checks, the CI/CD pipeline executes automated tests that determine if the build can go to production. Stringent tests give teams confidence that an application is production-ready.
When you make testing the center of your approach, you’ll be able to increase your speed sustainably; if you just focus on the quickest way to release to production, you’ll hit a wall because you’ll release bugs that need constant fixing.
Every team and company evolves a DevOps process that fits their working style, tech stack, and budget, but generally, we see a Crawl → Walk → Run evolution. No matter where your team is on that journey, this guide will help you reach the next level and the one after that.
Feel free to start at the top or skip ahead to the description of your pipeline.
Modern software teams use a CI/CD pipeline to manage an application build’s progress. In the continuous deployment model, the pipeline guides the build all the way to production through a series of automated checks and gates. In contrast, continuous integration and continuous delivery have similar checks, but the team manually advances the build through each.
Regardless of the model, the CI/CD pipeline consists of various system checks that work incrementally. Generally speaking, the pipeline waits to advance to the next check until it performs the current one successfully.
The CI/CD pipeline has four main phases:
The pipeline checks out the repository with the merged branch from the SCM (source-code manager, like GitHub or GitLab). Then, the build tool pulls all relevant dependencies and, if necessary, compiles the application. Once the pipeline builds the application, it executes unit and component integration tests. Test failures cause the pipeline to stop.
When all the tests have passed, the pipeline creates an application package (e.g., an archive file or a container). When successful, the pipeline labels any artifacts and pushes them to an artifact repository or archive, such as Artifactory or ECR.
When teams that have this in place, they are practicing continuous integration.
When the application is successfully packaged and archived, the pipeline pulls the artifacts, deploys them to the target environments, and attempts to start the application.
This phase is arguably the most crucial stage in the continuous deployment pipeline. For continuous deployment to work, the application released to users must be stable and defect-free. The lower-level testing done up until this point has validated that the new code is functional, but only end-to-end testing can validate that the new code plays nice with what’s already live.
Teams that can automatically deploy and immediately run E2E tests in a pre-production environment but stop short of deploying to production can say they practice continuous delivery.
Once this last testing phase has passed, teams have a choice about how to deploy the application to production. The critical distinction is whether this is done automatically or manually.
Teams with such an automated progression can say they practice continuous deployment because there’s no human intervention from pre-commit.
Continuous deployment doesn’t define the specific deployment approach — blue/green, canary, dark, shadow, recreate, A/B, etc. — and each team will decide which approach is best for them.
Each commit to your application’s repository triggers a build of the software. With the build complete, the pipeline runs unit-testing and component integration tests that provide feedback in a few minutes.
Teams just starting with continuous integration these days execute unit and integration tests on the build server. Many cloud-based SCMs, such as Github, have add-ons teams can use (e.g., GitHub Actions) to accomplish this task.
Standalone CI cloud services like CircleCI can work for teams whose SCMs don’t include CI add-ons.
Packaging is a separate set of complexities, and teams typically tailor their approach to their tech stack.
Cloud services like those mentioned above almost certainly have APIs that teams can use to package their application.
And, of course, containers are all the rage, and for good reason. They require a little setup but make it much easier for teams to execute their applications
→ Read more: Continuous Integration (CI) Explained
→ Read more: Continuous Integration
Congratulations! You’ve reached continuous integration 🎉
You should now have the following elements in your pipeline:
You’re ready to ensure all new product changes work as intended with all your live dependencies before they hit production.
Once the pipeline runs the unit tests and builds and uploads the package to the repository, it downloads that application package from the artifact repository onto the target instance in the pre-production environment and starts the application.
Bottom line: teams should test in an environment that is as production-like as possible.
Usually, teams start in a staging environment that is stable enough for them to run their end-to-end tests regularly without false positives. From there, teams step to deploying to ephemeral preview environments.
In the interest of building testing into the DNA of your DevOps process, we suggest starting slowly. We’ve encountered teams that want to go straight to production for testing, and while we admire their enthusiasm, we encourage teams to test production safely.
Testing in an isolated staging/pre-production environment is an excellent way to start practicing continuous development. Teams can make staging environments stable enough for them to run end-to-end tests against deployed target applications; for example, teams might consider limiting automatic deployment only to those applications that meet specific coverage requirements or deploying to a dark pool where they can test the application before it starts receiving traffic.
The main drawback of testing on staging environments, apart from their potential instability, is that there are always bound to be some differences between staging and production. Some of those differences are expected and put in place specifically for testing purposes.
The staging environment is (probably) under-provisioned
Which might mean that you have to run fewer tests simultaneously. Throttling your test suite will slow down your deployment and impact velocity across the organization.
It’s almost certainly under-attended
If staging uses older versions of software than production, or there are outages with some of the dependencies, it will affect how the tests perform and could mask defects that will later appear in production.
Data may be different
And not always a good reflection of the data in production.
Access permission levels could affect what tests pass
The DevOps people probably use the principle of least privilege, which says (in this case) that the relatively free-for-all staging environment will be more permissive than production, which is secured for conducting business.
That said, teams should strive to minimize those differences where feasible, lest the staging environment become too uncontrollable and impede the team’s progress to continuous development.
From a testing perspective, preview environments are the best of both staging and production environments. They’re isolated from the user, like staging. Still, they can be provisioned to use the same configurations and resources as production environments, and they're ephemeral — created and destroyed as needed — saving money in the long run. That said, there are some challenges with setting up preview environments.
Time and resource needs
Preview environments require significant time and resources to create and maintain the system.
You’re racing the clock
To be cost-effective, preview environments must be torn down instantly or given a time-to-live (TTL), which will be used to tear down the environment after a specified amount of time, typically a few hours. Any implementation of a TTL strategy should be tested to confirm it removes all data associated with the instance to avoid cost overruns.
The tooling for preview environments is still young.
Setting up your application to integrate with the environment may require additional work. You may need to build systems for configuration management, service discovery, secrets management, test-data management, and dependency management for preview environments to work automatically.
→ Read more: Deployment automation
You should now have the following elements in your pipeline:
Now that the team has the environment to test in, you’re ready to start gaining the confidence that your application is production-ready.
Once the pipeline deploys the application on the target environment, it executes a set of meaningful, reliable, and robust end-to-end regression tests. The pipeline stops just before deploying the application to production.
You want 80% of your application workflows covered by end-to-end tests.
Those tests should take 30 - 60 minutes to complete.
Use full test parallelization with QA Wolf to run your tests quickly.
Do not use a sharded (or per-node) approach because it’s not fast enough for continuous delivery without the team skimping on coverage. If the team skimps on coverage, bugs are bound to escape to production.
Start with simple smoke tests. You can use these later when you start testing in production. Build up to a few more extended acceptance tests that exercise critical functionality. Keep up with test maintenance while expanding the test suite to cover new workflows.
Once the team has mapped out all their workflows and automated at least 80% of those flows with a meaningful, reliable, and robust set of easily maintained tests that don’t yield false positives, they’ve probably noticed that executing those tests takes too long. From there, they can move on to full test parallelization. Once the team has all the tests running in parallel in 30 - 60 minutes, they are ready for automatic deployment to production
→ Read more: What QA Wolf means by “test coverage”
→ Read more: Parallel testing: what it is, why it’s hard, and why you should do it
You will never eliminate test maintenance — even (especially) with self-healing AI — but you can take steps to reduce the maintenance burden.
→ Read more: So you want to build end-to-end tests like a QA Wolf
Even the best-built tests have the potential to flake. Environment issues, network goofs, and inexplicable gremlins are part of testing. Automatically re-running failing tests will significantly reduce the human investigation work your team needs to do and helps your test automators focus on actual bugs and test maintenance.
Some teams use flake detection to rerun failing tests automatically. Calculations for determining a flake in automation can get pretty nuanced, but suffice it to say that any test should pass after a couple of retries without human intervention.
→ Read more: Flaky coverage is fake coverage
Arrange, Act, Assert is a pattern for writing narrowly focused tests that are easy to understand, write, and debug. Maintenance is more sustainable with smaller tests. AAA provides a standard structure for tests, allowing teams to identify gaps in coverage and isolate bugs quickly. If everyone uses it, it can become a lingua franca, enabling team members to communicate more effectively about test structure.
→ Read more: Arrange, Act, Assert: An introduction to AAA
→ Read more: AAA is the framework for the meticulous tester
For teams who want to build their own fully parallel testing system, there are three primary areas of concern: design, infrastructure, and test concurrency.
There is no one way to design and build the infrastructure teams need to run their tests fully in parallel. But, tracking down information on how to build it can be frustrating because there aren’t many organizations doing it. We’re happy to share a few things about our system, and you can check that out below.
→ Read more: How do we run all those tests concurrently?
If the team wants to run all of their tests concurrently, they need to write those tests in a particular way. Tests must be isolated so they don’t share data or state with other concurrent tests. They also need to be autonomous and not dependent on other tests. Lastly, they should be built in such a way that they return the same result every time the pipeline executes them; in other words, they need to be idempotent
Keep in mind that the environment you’re testing in has to be able to handle the volume. When the team runs all their tests in parallel, the system needs to handle the load; otherwise, the team will end up with test failures and probably some other problems.
For more on that, go back to testing in a staging environment.
Congratulations! You’ve reached continuous delivery 🎉
Your team should now have the following elements in the pipeline:
Teams that make it here ready for pain-free releases and increased deployment frequency.
Once the pipeline has executed the end-to-end tests and they’ve passed, it deploys the same application artifact onto the production environment. From there, it automatically executes another set of tests. If those are successful, it performs actions to direct some traffic to the new application version.
Teams should execute regular health checks every minute or so in production to ensure the applications are running smoothly.
Before releasing a feature directly to production, teams should start with automated sanity tests. As the team improves, they build up to automated acceptance tests.
The production environment is tied to business systems, like accounting or shipping systems, so your test suite needs to be able to run transactions without generating a lot of noise for other parts of the business.
Avoid business systems with “harmless” smoke tests
Running tests that don’t impact back-end business systems conveniently avoids creating noise in other parts of the business. Still, those workflows are often the most critical to revenue and the user experience — not the kinds of things you want to leave untested.
Work around business systems with blue/green deployment
In a blue/green deployment pipeline, there are two identical production applications — the one receiving internal traffic and one receiving external traffic. Public traffic is switched over to the fully-tested pool when testing is complete. This strategy also allows for fast recovery since the pools can be swapped out easily until a new release candidate arrives.
Chaperone business systems with synthetic transactions
Synthetic (fake) transactions are like continuously-executing tests. The team can monitor log outputs for these transactions as new releases are deployed and watch for defects.
Hide business systems with feature flagging
Feature flagging allows teams to push features into production in a disabled state. Teams can turn on flagged features for a limited set of customers, such as test users. Feature flagging can enable canary releases, ramped releases, and A/B testing.
→ Read more: Testing in Production: The safe way
Congratulations! You’ve reached continuous deployment 🎉
At this point, your pipeline should have the following elements
It would seem like releasing to production after the end-to-end test phase would be the end of the team’s journey. But teams that continuously improve keep going by shifting left.
Once a pull request is created for a new feature, a series of tests and checks are executed against the feature branch. If those checks pass, the pipeline merges the feature, triggering the CI/CD pipeline.
Teams should implement PR validation. To do so, they create an additional pull-request pipeline, including testing and static analysis.
Testing results should be published to the pull request so they’re available to reviewers.
Any failures in this pipeline should block the pull request to prevent merging it until the developer resolves the failures.
To run end-to-end tests in the pull-request pipeline, teams should create a build artifact from the feature branch and deploy it to ephemeral preview environments because these environments are isolated and stable.
Teams typically deploy the feature branch on a preview environment because such environments are isolated and stable. If they want to test before merging, teams need to have a reasonably sophisticated system in place, which might include an orchestrator, a configuration discovery mechanism, a secrets management mechanism, etc.
The advantage of pre-merge testing is that it allows developers to audition changes and increase their confidence before closing their eyes and pushing the button that kicks off the chain reaction and sends their code change straight to production.
Your team can declare victory when you have
QA Wolf provides a complete solution for teams who want to get to continuous deployment with comprehensive end-to-end testing. When you let us handle the creation, execution, and maintenance of your end-to-end tests, you’ll free up cycles on your team that will allow them to start delivering higher-quality releases more quickly.