We can test any generative AI app or LLM as long as you can give us access, either through a UI front-end or a back-end API. Once you set up your application in the environments best suited for your testing needs, we'll customize our tests to work with them.
We have two main approaches when it comes to non-deterministic assertions: we can set the temperature on the model to return more predictable results or pass the output to an AI to evaluate the model's responses. Our strategy depends on what we’re testing and what your team thinks is most important.
Sure can! We automate "adversarial" tests that purposely introduce bias to check that your application doesn’t get tripped up. Remember, though, monitoring for bias is really a long-term game, best played in live production environments—a service we’re not offering just yet.
We use GCP Cloud SQL with AES-256 encryption for data at rest, and our system-to-system chats are safeguarded by TLS via Google Kubernetes Engine. But the best way to protect sensitive data when testing is to limit the tests access to it in the first place, unless the test is specifically testing data security. We recommend that our customers limit our sensitive data off the table entirely—mask it, or better yet, go with synthetic data for testing.
We use Microsoft Playwright for authoring tests. Where appropriate, we use the framework’s visual assertions in combination with our visual diffing tool to perform a pixel-by-pixel comparison against a known good image and returns the percentage of detected change. It all runs on Kubernetes and Docker.
We can meet your team wherever they are, whether that’s scheduled runs, triggered runs from SCM like GitHub or GitLab, or API calls. We can run on ephemeral environments to validate individual PRs, and you can designate specific tests (or all of them) to be release blockers if they fail.
Since we’re a black-box testing service, we don’t have access to your production systems. We focus purely on what we can test from the outside.
We report the most critical information — whether the test suite passed and if it didn’t, where the bugs are — through your messaging app, SCM, and issue tracker that your devs are already in. You can get more detailed and historical information in the QA Wolf dashboard.