Smoke testing with Claude and Playwright

Three translucent glass layers floating in a stack, representing an interface, its logic, and the foundation underneath.

Anthropic recently made the Claude for Chrome integration available to all paid users. This opened an opportunity to fix a nagging problem.

We’re a small, resource-constrained team. We needed a smoke testing suite for our console app but didn’t have dedicated QA. Our previous solution was Zephyr/Reflect.run, a SaaS product that charged per Jira seat, had limited AI capabilities, restricted run counts, and came with integration headaches. There had to be a better way.

First pass: use Claude for Chrome directly for smoke testing. The workflow is simple. Fire up Claude for Chrome while in your web app. Click through the smoke test flow (log in → open usage → open API keys → open billing → log out) while describing what you’re doing using your mic. Claude observes the flow and generates a reusable script in plain English. You can save it as a slash command (reusable shortcut) and even run it on a schedule.

The problem: this approach doesn’t sync with our CI/CD pipeline. You’re either triggering manually or scheduling runs that don’t match your dev cadence. It’s non-deterministic, so run-to-run consistency isn’t guaranteed. Also, there’s no clean way to run it against multiple environments with different credentials and no proper secrets management.

Next level: turn it into a Playwright project and operationalize it with Claude Code.

I first asked Claude for Chrome to convert the plain English script into Playwright code. Result: a monolithic 250-line script covering everything in one test. I copied it locally and fired up Claude Code. Asked it to analyze the monolithic output and restructure it. It proposed atomic tests organized by feature, set up environment configuration for staging versus production, handled credentials via env files, and turned it into a proper repository.

Then came the refinement loop. This is where the Claude for Chrome integration paid off again. Claude Code ran tests against our staging environment, analyzed failures using screenshots and traces, proposed fixes, and iterated until all tests passed.

Finally, I plugged the result into our deployment pipeline. Smoke tests now run automatically on every staging deploy.

Total time from initial recording to deploy pipeline: roughly two hours.

I don’t have a testing background. I’m just moonlighting, plugging a hole. Curious how this approach compares with alternatives like Stagehand or other Playwright setups. What’s working for your teams?

Reference: Getting started with Claude in Chrome