My AI assistant can finally test its own code
Hey everyone,
If you’re anything like me, you’ve probably been spending a lot of time with AI coding assistants lately. I’ve been using Claude Code quite a bit, and while it’s incredibly powerful, it also comes with a unique set of frustrations. The biggest one for me? It can’t check its own work.
Claude can write pages of code, but it has no way of knowing if the buttons it just created actually work, or if the UI looks the way it was intended. It’s like a chef who can write a recipe but can’t taste the food. This often leads to a cycle of generating code, testing it, finding bugs, and going back to Claude with corrections. It works, but it can be frustrating.
I’ve stumbled upon a partial fix that, while not 100% perfect, has come pretty close to eliminating a huge chunk of that frustration. The solution is to give Claude a pair of eyes by letting it control a web browser using a tool called Playwright.
This guide is for macOS but can be easily adapted to work on Linux and Windows.
Part 1: Getting the prerequisites
First, we need to get a few things installed.
1. Node.js
You need to download Node because it’s needed for Playwright. It’s supported on pretty much all OSs under the sun. You can grab it here: https://nodejs.org/en/download
Once installed, you can check that it’s working in your terminal of choice using:
node --version
If you see an output saying something like v20.11.0, you’re good to go.
2. Claude Code
Next up is the Claude Code command-line tool. To run it, you need:
- A Claude paid account OR an API key: If you have a paid plan you might (and probably will) run into your quota, but it can be more cost-effective if you use it a lot. If you use Claude Code now and then, I recommend using the API (as you pay per token). But if you’re like me and use it constantly, having a Pro or Max account makes a lot of sense.
- Node.js 18+ installed (which we just did).
Now let’s get started. Run the following command in your terminal:
npm install -g @anthropic-ai/claude-code
If you run into issues, check out the official Claude Code documentation for more info.
Part 2: Setting up Playwright and MCP
Now that the basics are done, we can move on to Playwright, which will control the browser for us.
1. Install Playwright
First, you need to install Playwright itself and then the browsers for it to control. Run the following commands:
# Install playwright globally so you can use it in any project
npm install -g playwright@latest
npx playwright install
Now, just verify that everything is installed by checking Playwright’s version:
npx playwright --version
It should output something like Version 1.41.2 if everything went well.
2. Configure Claude to use Playwright
This is the final step that connects everything. Open the terminal in your IDE of choice (I’ve been super happy with Zed lately as it’s minimalistic and super fast) and run this command:
claude mcp add playwright npx @playwright/mcp@latest
If all goes well, you’ll see a message about MCP being added to a local config file. You’re now ready to go.
3. Testing Playwright
Now you can test Playwright by instructing Claude to use it for testing features. In the demo below, I asked Claude to create a simple TODO app (which seems to be the modern version of a “hello world” app) and use Playwright to make sure it works correctly.
Video demo was originally hosted on Substack and is not available here.
Supercharging your workflow: best practices
There are several ways to use Playwright with Claude, but here is one example to help you get started:
-
Create a test plan. To ensure your app is solid before every commit, create a simple test plan file in your project called
TESTS.md. It can be a simple checklist.Example
TESTS.md:
# Application Test Plan
## Login Page
- [ ] A user with valid credentials can log in successfully.
- [ ] A user with invalid credentials sees an error message.
## Dashboard
- [ ] The user's name is displayed after logging in.
- [ ] The "Create New Project" button is visible and clickable.
- Then, you can instruct Claude to execute it:
“Using Playwright, execute all test scenarios listed in the
TESTS.mdfile and report the pass/fail results for each item.”
This workflow hasn’t eliminated every headache, but it has significantly reduced the friction in my development process. It feels more like a partnership now, where the AI can take on a bigger, more intelligent role in the debugging and verification process.
I’ll be posting more about Claude Code as I keep using it. Let me know if this helps you out!