Version 4.5 just released

QA Wolf AI

Increases velocity of Playwright test automation by 5x

Watch QA Wolf AI generate a Playwright test for Figma’s prototyper in just 6 minutes compared to 29 minutes without AI
QA Wolf without AI
(19m)
QA Wolf with AI
(6m)
Playwright
(30m)
QA Wolf AI is 4.8X faster than Playwright
6 min vs. 30 min
QA Wolf AI is 2X faster than vanilla QA Wolf
6 min vs. 19 min

How QAW AI stacks up

QA Wolf without AI
19:00
QA Wolf AI
6:00
Playwright recorder
30:00

Meet the agents

QA Wolf’s multi-agent system has specialized agents for specific tasks. This allows the system to use multiple context-heavy inputs, including the video and transcript, DOM snapshots, and browser logs to outline and code a test.  Working together, the agents solve problems faster, with fewer mistakes than a single-agent system ever could.
The Orchestrator
Agent orchestrator character
One agent to rule them all and control the flow of information between agents.
The Outliner
Agent outliner character
The Outliner develops comprehensive test plans and AAA outlines after watching a product tour and capturing the the client’s testing goals from the audio.
The Code Writer
Agent code writer character
The Code Writer generates open-source Playwright code. It’s trained on 700+ gym scenarios derived from 40M test runs and can automate any test case that Playwright supports.
The Verifier
Agent verifier character
The Verifier runs the test code to make sure it works as intended.
+150 other agents

AI agents + human reviewers
= 100% accuracy

As the agents create and update E2E tests, human reviewers check their work. The partnership with QA Wolf AI lets our engineers do 5x more work in the same amount of time — new tests are created in minutes and existing tests are updated almost instantaneously.
Illustration of agents creating tests and human(wolf) reviewing them

700 eval criteria

QA Wolf AI agents are evaluated each night on their ability to handle 700 unique UI scenarios that have been identified from QA Wolf’s history of 50,000,000 test runs.

The evals — known collectively as the training gym — are in turn evaluated for how well they reflect real-world conditions.

Dive deeper

Under the hood

Explainable decision-making

Every decision the agents make is logged in plain English and auditable by QA engineers and customers, so there’s transparency and accountability.
UI showing howl's decision making process
generated js code

Arbitrary Javascript development

Test code generated by QA Wolf AI makes use of variables, loops, helper functions, and conditions required to automate complex workflows.

Multi-source context

The agents can reference multiple streams to evaluate a situation and implement an action, including AAA test outline, browser logs, HTML of the page, visual screenshots, and video product tours.
contexts for the AI
Join the wolf pack

Ready to ship faster and with fewer bugs? Get started today:

Schedule a demo