QA Wolf builds and manages tens of thousands of end-to-end tests for our clients. We’ve learned that a lot of the accepted wisdom about automated testing just isn’t true and want to provide our perspective on building, running, and maintaining large test suites.
Teams often get overwhelmed by automated testing because the initial investment is quite significant. If it takes 2–3 hours to write a test, then a mid-sized suite of 300 tests would take a team 750 hours (four people full-time for almost five weeks) to build. And that’s after careful planning to enable the tests to run in parallel (more on that below). But that’s a very small investment compared to the long-term costs.
Building a test is more or less a one-time investment, but triaging every failure, filing bug reports, and updating the tests when the UI changes is a never-ending job. Any time a test fails, someone needs to investigate it and take the appropriate action: re-run the test, file a bug report, or fix the test code.
Test maintenance is not a project that you get ahead of or ever finish. You’re always behind and forever treading water. And if you don’t stay on top of the maintenance work, the tests block deployments or start getting bypassed so that a build can be released. Without constant maintenance, your investment in building the tests goes down the drain. Within six weeks, 50% of your tests will be broken. But that’s QA and one must imagine Sisyphus happy.
Flaky tests are tests that sometimes pass and sometimes fail without any apparent reason. They create a lot of uncertainty in the QA process, and teams with lots of flaky tests start to distrust the results. Teams tend to try to address flakiness at the test level because trying to address it at the suite level can lead to time-consuming refactoring that affects every test in the suite.
In our experience, flakes have more to do with conflicts between multiple tests in the same suite, external dependencies like third-party systems, and environment resources — and these issues cause flakes throughout the entire suite, not individual tests. Identifying and addressing the underlying causes within the suite's architecture and dependencies frequently lead to a more stable and reliable testing environment.
This is a partial truth. Yes, testing environments need to have the resources to handle the volume and the runner infrastructure needs the computing power for all those browsers. Still, if the test suite is not well architected to run in parallel or contains dependent tests then all the AWS budget in the world isn’t going to help you.
To run hundreds or thousands of automated tests in parallel takes planning to prevent test collisions where one or more tests use the same data or feature at the same time. On top of that, the tests need to be written so they create and destroy the data that they need to keep the testing environment clean and prepare for future test runs.
It’s easy to understand why the Page Object Model was developed. Having a repository of reusable code lets a QA engineer make a change once and have it propagate across an entire suite. While that may work for a small suite of smoke tests, the reality is that highly abstracted code becomes a maintenance nightmare.
Code structures like the Page Object Model are alluring but can force developers to exchange simplicity for reusability.
Anyone who has worked on tests that use the Page Object model knows that each implementation has its learning curve to understand what each of those pages do and how they are used in context. Furthermore, any changes made to a single Page Object have to be tested in every suite that uses the model, so maintenance time increases as you add more Page Objects to your test project. While there’s some overhead in duplicating a few frequently-used locators, in the long run, we’ve found that being able to isolate broken code to a handful of tests at a time outweighs any benefits of reusable code.
It’s common for testing teams to adhere to certain standards and practices, like only using Gherkin or the same language as the application tech stack. The reasons make sense: code base consistency, easier integration, skills transfer, etc. However, being so dogmatic about the process can lead to gaps in coverage when test cases can’t be automated by the tools being used.
The goal of the QA team should be comprehensive test coverage — 80% of workflows or more, tested daily or upon deployment. The tools that teams use should be a means to that end, not the end in themselves. If there’s a test case that the current toolset can’t support, teams have to be willing to bring new tools in to get the job done. The end-user certainly doesn’t care how many tools are used for testing — they only care that the testing got done and they can use your product without any issues.
More than anything, we’ve learned that if you’re going to manage end-to-end testing for more than 100 companies you need to take a realistic approach and focus on what matters: High test coverage, reliable test suites, and accurate results.
Making your test suite simple and sweet, makes it easier to maintain in the long run. Avoid complex architectures that you’ll spend more time debugging than running. And be flexible with your tooling to accomplish the real goal: releasing well-tested features that your users want.