Building and maintaining the infrastructure to run thousands of tests in parallel is hard enough as it is, but once you've perfected, you're still only halfway to being ready for parallel execution. Your tests have to be constructed so they don’t fail sporadically, take too long, or cause problems that leak into other systems.
Parallel end-to-end testing was a pipe dream until a few years ago, so there's a severe shortage of centralized documentation on how to write tests for parallelization properly. End-to-end testers can partly help themselves by following the unit-test playbook. But, because unit tests validate behavior and end-to-end tests primarily validate state transformation, there are some key differences between the two in structure.
The technical adjustments testers must make to implement parallelism are straightforward; they need time and consideration. We’ve made it work. We run over two million tests per month, so we know that the methods we’re about to describe work. It’s part of the reason we can have a zero-flake guarantee.
First, look at the properties parallelizable tests share with good unit tests.
An autonomous test will report the same result when executed in any order with other tests or when it runs alone. If all your tests run in parallel, there is no order unless some of your tests are dependent on other tests. The reason good E2E tests are autonomous is that chaining tests together can create false positives (failures that aren't bugs), as well as increasing the length of the test run.
An isolated test runs without affecting (and is unaffected by) other tests. Isolated tests don’t share state with or cause side effects for other tests or applications in the ecosystem.
Idempotence means that a test can be run multiple times without affecting the outcome. Pushing the Lobby button in an elevator multiple times will repeat the instruction but it won't make the elevator move any faster. Idempotent tests always return the same result by setting up their preconditions and cleaning up after themselves.
Good tests validate one thing only. If a test fails before its main assertion is complete, that is a false positive, even if it was caused by a defect elsewhere in the system. A separate test should cover that defect. Furthermore, testers should limit the number of behaviors they exercise in a test to reduce the likelihood of modifying a resource needed by another concurrently running test.
To take advantage of test parallelization, test automators can borrow from another source of inspiration when writing their tests: application developers.
Application development teams typically enforce standards by doing code reviews on pull requests by configuring static analysis and applying the results. These practices are not as standard on test automation teams as they should be, mainly when the test code is in a separate repo. Default settings for static analyzers are typically optimized for application code, and test teams need to tweak rules (such as rules that flag uncaught exceptions) for their test projects.
Like application code, tests need to be legible, not just so that anyone on the team can understand what a failing test is trying to do but also because testing teams are particularly susceptible to staffing changes.
Keeping tests atomic is an excellent way to ease the burden of onboarding new teammates. Atomic tests are easy to understand because they focus on specific functionality or scenarios. Teams that use the AAA (Arrange, Act Assert) framework, like we do, find that it makes their test self-documenting. Furthermore, it gives their tests atomicity and uniformity and provides a structure that testers can follow to easily debug, maintain, and take ownership of any test.
One common problem is that test automators rarely consider idempotence when writing tests. That’s because, in the old days, they could safely ignore it. Most of the time, test data doesn’t cause problems in downstream systems. But when it does, it becomes a big problem that’s hard to track down. But when you move to full parallelism, your test runs take less time, so you run them more frequently. And what was once an occasional problem starts showing up more often.
It’s a minor adjustment, usually, to make sure your tests clean up after themselves (actually, frequently, you probably want to clean up before). This cleanup is part of what makes tests idempotent so that they are reliable enough to be run in parallel. Our testers do it, and it gets enforced in code reviews.
Threading is another common concern when writing tests that share state with other tests. JavaScript end-to-end testing frameworks, such as Playwright, avoid many of the threading issues, like the Java implementation of Selenium's WebDriver, that have historically hampered automators.
And the advent of containers has also helped make test parallelism easier. For example, by running each test in a separate container, teams isolate tests at the process level, so those tests will rarely fail due to internal resource constraints (such as running out of memory). That also means fewer false positives and less failure investigation and maintenance. As it turns out, that's the way QA Wolf's system works.
Because you have to manage test data for parallelized tests, you add another layer of complexity to an already challenging task. In the context of parallelization, some standard and seemingly harmless test management strategies are anti-patterns. One such pattern, database refreshes in combination with ad-hoc SQL, can cause unintended outcomes. An example of such an outcome is test-data poaching. Test-data poaching is a race condition where one test creates data, and another test modifies that data concurrently such that the original test cannot access it or change it when the time comes. But this is just one of many ways test data can cause problems.
One challenge for end-to-end automators is limiting the side effects from data their test creates, specifically in the setup or as a result of exercising behavior. Teams can use a few strategies to make their test data creation idempotent. While test data is where QA Wolf’s testers admittedly exercise the least control, we train our team to use unique data for each test to reduce side effects such as test data poaching.
None of our suggestions above are particularly challenging from a technical perspective. The hard part is the cultural shift: getting the team to buy into the need for parallelism and faster tests. Old habits die hard. Generally speaking, the challenges of writing parallelizable end-to-end tests are the same as unit tests, with added complexity.
If you want your test team to start delivering faster, forget about making minor optimizations. The most significant impact you can make on your test team’s speed is to help them get to fully parallel test execution. Even if your architecture is ready for full parallelization, your tests must be parallelizable. And your team needs to be bought into the process.
At QA Wolf, we are already there, and we’d love to help your organization increase the speed and reliability of your end-to-end testing.