How QA Wolf executes every test case in parallel

Noah Sussman
John Gluck
October 11, 2023

Nine pots can't hard boil an egg in one minute. Some aspects of software projects simply require plans that follow a particular order and have specific tasks that must happen sequentially. But test execution isn’t one of those.

Tests can be executed at any time, in any order, regardless of context, when they are written specifically so as not to impact the state of other concurrently running tests. This makes test execution a perfect candidate for full parallelization. And yet, very few software teams can execute all their tests concurrently, or fully parallel. That’s because full test parallelization is difficult to achieve. It requires not only that you build your tests so they run atomically but also that you design your infrastructure to rapidly scale up to accommodate as many nodes as you have tests that need to execute concurrently at that moment and then rapidly scale back down to optimize cost, all the while staying prepared to handle incoming requests.

QA Wolf has designed such a system, and we offer it as part of our solution. When you sign up with us, your entire test execution will be completed in the time it takes to run your longest-running test, whether you have 200 tests or 20,000. We’re proud of our work, and we’d like to take this opportunity to explain to you how our solution works and why it's the best on the market.

We’ll start at the beginning of the test run.

Triggering test runs

QA Wolf lets you trigger test runs in one of two ways: by a webhook or a timer. For the webhook method, your QA Lead will help you configure that in your source control management system (e.g., Github, GitLab).

If your SCMS doesn’t have webhooks and you don’t want to use a timer, no problem. You can trigger your test locally on demand or set up a cron job. Regardless of which of the methods you select, our documentation contains complete instructions on how to make this work.

Dynamic test configuration

Each customer test is stored separately in our database. When a test run gets triggered, our system creates a new test request record in our database, including a build number, our unique identifier for the run request. Our system will save the results of this test run under that unique build number, along with artifacts such as system logs, videos, and other relevant metadata. 

From there, we generate what we call a "run data file" that contains the pertinent information about the tests in the run request, including the code for each test targeted for the run and any additional helper code associated with those tests along with any parsed configuration or other custom settings the customer may have input from our editor. Our system then transmits this dynamically constructed file to our runner, which uses it to assign existing cloud compute resources for running the tests.

Resource provisioning

A chart showing our cluster with two nodes, each with two pods, each with a container

Our Kubernetes clusters keep warm nodes available for provisioning. Nodes in the Kubernetes cluster use pre-built container images explicitly prepared to give the most robust performance when running end-to-end tests. Our pre-built containers have everything our tests need to execute a test in our chosen framework, Playwright.

The number of warm nodes in the Kubernetes cluster scales up or down depending on anticipated test run demand. Sometimes, demand for test runs exceeds the amount of warm available nodes across all clusters. In that case, our system instantly spins up new nodes to meet that demand so that the system can execute every test in the request on its own container concurrently, with multiple containers on each node. At the same time, it scales up the warm node pool to meet any additional subsequent requests.

Once we’ve identified all the nodes we need to execute all of the tests in the run in parallel, we reserve those nodes. Each isolated container gets the test code for its particular test, after which we mount that test code to the container’s file system, and when that’s complete on all the containers, we’re ready to start executing our tests.

Test execution

We run all the tests in browsers, not headless mode, so that we can capture and save videos for each test for debugging purposes. Yes, this is more resource-intensive but the value it adds for debugging is worth the extra bump in processing.  And besides, as you can see, our system was designed to handle it. Each test has a history so that we can see videos of previous test runs in our editor. All of this test-specific data is persisted as each test finishes, along with any associated artifacts (logs, HAR files) and meta-data. Additionally, should a test fail, we attempt to run that test again twice in case the failure was due to an anomalous issue on the application environment, such as a restart.

There’s a reason we go through the hassle of spinning up a separate container per test: our experience has shown us that under-provisioned resources are a primary cause of test failure. It makes sense when you think about it. Test execution is processor-bound, but that processing is mostly spent polling the state of the system-under-test (SUT). The slower the SUT, the more the test is required to poll, increasing the demand on the processor. We can’t control our customer’s applications, but we can isolate each individual test’s resource consumption to prevent it from impacting other tests. So that’s what we do.

Result aggregation and reporting

After our system has executed and, if necessary, retried all provisioned tests, it consolidates the individual test results into a unified table in our database. On the infrastructure side, we shut down the containers and their corresponding resources. Kubernetes helps us ensure our system leaves no unnecessary resources running, which helps us keep down our operational costs and pass those savings on to our customers.

Notification

Our customer dashboard

We notify the customer when a test run ends and the results are published. Our notifications are highly configurable, and we support email, messaging platforms like MS Teams, or even updates to dashboard systems such as Grafana or Tableau. And, of course, customers can always view the results of their test runs on their QA Wolf Dashboard. 

Full, cloud-native test parallelization

Our solution is 100% cloud-native, which means that full parallelization costs us roughly the same as a sharded (or per-node) approach would, but clearly full parallelization offers more value to customers. Other companies who offer parallelized testing services may be stuck with the disadvantage of early adoption: costly virtual machines. They pass that cost onto their customers. 

But the math here is super simple. If a service charges you by the parallel node (read: shard), your tests are still running sequentially, n tests at a time where n is the number of nodes you pay for. If you have 200 tests that take 5–10 minutes to run, and you pay for 10 parallel nodes, your tests will take somewhere between 1 - 3 hours to complete. With full parallelization, your test run will take as long as your longest test to complete: 10 minutes.

That said, our product offering includes full parallelization using the system we described above. When you sign with QA Wolf, it’s part of the package that your entire test coverage will run in a matter of minutes, not hours. We know you’ll love our full parallelization feature because we see it adding value for our customers every day.

Keep reading