So you want to build end-to-end tests like a QA Wolf

Rebecca Stone
April 27, 2023

Start with narrowly-focused tests

By narrowly focused, we mean tests that validate just a single action, rather than complex, compound tests that check several things as they run. Yes, you’ll end up with more individual tests, but think about all the benefits:

  • Fewer false positives. End-to-end tests are flaky by nature. So many things that aren’t bugs can cause a test to fail: Network hiccups, changes in the test environment, and problems with third-party APIs will all throw false positives. When you design tests to be short and quick with minimal dependencies, fewer things go wrong. You get fewer false positives to investigate.
  • Faster run times. Narrowly written tests finish faster. Which means if you’re concerned about bugs in one specific area of the application, you can run those tests and get the results you need. There’s a reason Netflix has a “skip intro” button; you just want to get to the good stuff.  
  • Faster bug spotting. When a complex, compound test fails, someone has to go in and figure out where it failed before you can determine if it was a bug or a flake. There’s just more to wade through. We write narrowly-focused tests so that when one fails, we know exactly where things went wrong from the test name alone.
  • Simpler maintenance. Products change all the time, and those changes don’t always come with a warning. When a change breaks a test, small, focused tests make the damage easier to contain. You don’t lose coverage across entire flows. It’s also faster to update small tests and remove the ones you don’t need anymore.
  • Easier long-term support. Just as products change, so do team members. When tests are kept small, it’s easier for a new person to come in and figure out what was going on.

Prevent test collisions

When running tests in parallel, like we do at QA Wolf, it’s possible that one test could change a piece of data or site configuration while another test is running, causing one or more tests to fail and creating a lot of noise in your results. There are a couple of ways to avoid collisions, but you have to decide what trade-offs work best for the environment you’re testing in and your testing goals.

Technique 1: Use multiple testing accounts

The benefit of having multiple test accounts is that they can each interact with the same systems at the same time. The downside is that the system creates and manipulates a lot more data. Test environments usually aren’t as robust as production, so you may need to prepare your testing environments to handle the additional data and traffic.

You can minimize the impact of data bloat with self-cleaning tests, which we’ll get to below.

Technique 2: Pace out your test runs

If your test environment is just too wobbly to run hundreds of tests at the same time, simply pace them out so they run in batches. Alternatively, you could have one test trigger the next. The test suite will take longer to run, but for most teams, 80%+ test coverage is more valuable than truly continuous deployment.

Run clean-up steps to minimize junk data in the test environment

Automated tests can generate a lot of data. If you let that data accumulate, it will quickly overwhelm the test environments. That’s why QA Wolf writes self-cleaning tests that reset the environment, so every run starts fresh. First, they delete any data that a previous test run might have left behind. And when the test is over, they delete anything they created. Just like hiking in a national park, we prefer to leave the testing environment cleaner than we found it.

Technique 1: Have each run create and delete the necessary data

Let’s look at an example: A social media company wants to test its post editing functionality. A QA Wolf test would create a new post first rather than find an existing post. The test becomes more reliable because it always knows what to work with. But if we don’t delete that post at the end, there would be endless duplicates clogging up the database.

While it’s easier to use the UI for all this, you may need to have your tests access an API or admin panel. Just make sure that if this is a mobile app test, you refresh any cached data before the main part of the test starts running.

Technique 2: Clean up before you test

Sometimes a test will fail before it can clean up the data it created. That’s what it means to have automated tests. To avoid collisions the next time around, each of our tests looks for (and deletes, if necessary) data that shouldn’t be there. That creates a clean slate on which to run, yielding a consistent and reliable result each time.

Use stable selectors

The one thing you can count on in software testing is that the product will change and your tests will break. To prevent minor changes from wreaking havoc on your test suite and reduce the amount of triaging and maintenance, stable selectors should be part of any team’s definition of done.

Your best selectors are unique IDs

Choose the most reliable selector the platform exposes. These attributes exist to make elements testable, and they’re less likely to change when the UI evolves.

Platform Best Attribute Fallbacks
Web id data-test-id, aria-label
iOS accessibilityIdentifier name, label
Android resource-id content-desc

Use pattern matching and attribute combinations as a back up

Sometimes, unique IDs aren’t available, but the QA engineer should be wary of using unstable attributes because of how brittle they make the tests. The only real option is to build selectors using pattern matching and combinations of stable attributes, adding values like class, type, or role where they’re available and reliable.

Strategy When to Use How to Do It Well
Web (CSS)
Pattern Matching id or data-test-id has a dynamic suffix/prefix Use CSS attribute selectors:
[data-test-id^="submit"], [id$="form"]
MDN docs
Attribute Combining No single attribute is unique Combine data-test-id, aria-label, type, or role:
button[data-test-id="submit"][aria-label="Send"]
Avoid: class, :nth-child, or visual structure
Android (Appium)
Pattern Matching resource-id includes predictable dynamic content Use resourceIdMatches():
resourceIdMatches("^submit")
Attribute Combining No single field uniquely identifies the element Chain .resourceId(), .className():
new UiSelector().resourceId(...).className("submitButton")
Tip: Keep it readable—2 to 3 filters max
iOS (Appium)
Pattern Matching accessibilityIdentifier, name, or label is partially dynamic Use predicate strings:
accessibilityIdentifier BEGINSWITH "submit"
Attribute Combining No single field is stable or specific enough Use accessibilityIdentifier if available first.
Combine type, name, and label with AND:
type == "XCUIElementTypeButton" AND name BEGINSWITH "submit"

Use text as a last resort when you’re combining attributes to form a selector. Text tends to change a lot, which breaks tests, and it’s one of the least unique attributes you’ll find in the DOM.

Rely on layout or position at your own risk

Avoid selectors that depend on the layout or view hierarchy. They break often, especially in component-based UIs, because the structure changes even when functionality doesn’t.

Structure-based selectors are fragile because they tie your test to the shape of the UI, not to the elements the user actually interacts with. Add a new sibling, refactor a layout, or reorder components, and suddenly your test fails for no good reason.

  • Web: .first(), .last(), :nth-child(), or overly-specific chains like .card > button
  • All platforms: XPath like //XCUIElementTypeButton[2] or //android.widget.Button[1]

XPath is the worst offender. It’s verbose, hard to maintain, and slow to execute. Use it only when you’re testing a third-party component you can’t change, or there are absolutely no other reliable attributes available. Be sure to open a tech debt ticket  to clean it up later and polish your resume because you don’t want to work anywhere that forces you to do that.

Now you’re ready to write tests like a QA Wolf!

Building a comprehensive test suite that runs quickly and reliably requires work and planning. Follow our advice, and you can have a suite with thousands of tests running every time a developer deploys.

Or make life easier on yourself: Have QA Wolf handle your testing needs! We’ll build out comprehensive coverage in less than 4 months and handle all the maintenance for you. You won’t even have to think about flaky tests — we investigate each failure, and only pass the verified bugs into your issue tracker.

Schedule a demo below, and we’ll talk about your needs.

Some disclaimer text about how subscribing also opts user into occasional promo spam

Keep reading

E2E testing
Why mobile E2E tests flake, and how QA Wolf controls every layer to stop it
Test automation
The best mobile E2E testing frameworks in 2025: Strengths, tradeoffs, and use cases
E2E testing
5 strategies to address Android emulator instability during automated testing