AI-Assisted QA: How I Use Claude Agents + MCP to Automate the Testing Pipeline

In a previous article, I covered how I built a Cypress E2E suite with 1,434 automated test cases for a personal management app. That article was about the tests themselves — patterns, structure, auth strategy, coverage. This one is about what happens before those tests get written. The part most developers skip: who decides what to test, who reviews the implementation for gaps, who notices that a data-testid is missing before the test even runs? In this project, that's a Claude agent — a specialized AI QA engineer running as part of a multi-agent development pipeline, connected to the real database via MCP.
the implementation
The Setup: A Multi-Agent Development Team
The app is built by five specialized Claude agents, each with a defined role:
PM → UI/UX + Backend (parallel) → Frontend → Tester
| Agent | Role |
|---|---|
| PM | Owns requirements, updates PRD |
| UI/UX | Design decisions, component specs |
| Backend | API routes, DB schema, services |
| Frontend | React components, UI states, data-testid |
| Tester | Code review, E2E tests, coverage reports |
| Orchestrator | Coordinates end-to-end feature delivery |
The Tester Agent is always last in the pipeline — and it's the quality gate. Nothing is "done" until Tester signs off.
What MCP Adds to the QA Layer
The project has two key MCP integrations relevant to QA:
1. Supabase MCP — gives agents direct access to the database: query tables, inspect schema, run migrations, read logs. The Backend Agent uses this to verify schema state before writing code. The Tester Agent's DB verification tests (cy.task()) hit the same real Supabase instance — no mocking.
2. Claude Code as the runtime — agents run inside Claude Code, which means they can read files, grep the codebase, run Cypress headless, and write reports — all as part of one autonomous workflow.
Together, these mean the Tester Agent can:
The Tester Agent's Kickoff Protocol
Every time the Tester Agent starts a session, it runs a fixed protocol before touching any test file:
- Read .claude/PRD.md → understand required behavior
- Read tester-agent-memory.md → recall known flaky tests, past bugs
- Read tester-knowledge.md → follow canonical test patterns
- Read shared-knowledge.md → cross-agent signal formats, Definition of Done
- Load cypress/fixtures/app-constants.json → verify all testIds and endpoints exist
- Check cypress/plugin/tasks/ → what DB query tasks already exist?
- Check pending-signals.md → any unresolved signals addressed to Tester?
- Start work
This is not just "read some files." This is an agent building a complete mental model of the system before writing a single assertion.
The PRD defines the expected behavior. The component files show the actual implementation. The gap between them is where bugs live — and where the agent looks first.
Cross-Agent Signals: The Tester as a Bug Reporter
One of the more interesting behaviors is how the Tester Agent communicates back to other agents when it finds problems.
All inter-agent communication goes through a single file: .claude/agents/signals/pending-signals.md. It's a structured inbox — every agent checks it at kickoff and resolves pending signals before starting new work.
When the Tester Agent audits a component and finds a missing data-testid:
javascript
1 🔖 data-testid Request — Tester → Frontend 2 Component: ProductsTable (app/main/inventory/product-list/list/ProductsTable.jsx) 3 Missing IDs: 4 - product-table-row → the <tr> element for each product row 5 - product-stock-badge → the stock level indicator badge 6 Needed for: cypress/e2e/inventory_management/product/product-list-ui.cy.js 7 Action: Add data-testid attributes + register in cypress/fixtures/app-constants.yaml 7 lines
When the Tester Agent finds an endpoint missing an edge case:
javascript
1 ⚠️ Endpoint Gap — Tester → Backend 2 Endpoint: POST /api/inventory/v1/product/create 3 Missing edge case: 4 - Returns 200 instead of 400 when usage_quantity is a negative number 5 Action: Add validation so tests can assert the correct behavior 5 lines
The signal sits in pending-signals.md as [PENDING]. The Frontend or Backend Agent picks it up at their next kickoff, makes the fix, and marks it [RESOLVED: YYYY-MM-DD].
This creates a tight feedback loop that would normally require a Jira ticket, a Slack message, and two meetings.
DB Verification: Querying Supabase Directly
The most concrete MCP integration in the QA layer is how the Tester Agent verifies data persistence.
The test doesn't just check the API response. It checks what's actually in the database:
javascript
1 it('should persist data correctly to database', () => { 2 cy.request({ 3 method: 'POST', 4 url: C.endpoints.inventory.create, 5 body: buildRequest(), 6 }).then((res) => { 7 expect(res.status).to.eq(201); 8 9 const { id, user_id } = res.body.productList; 10 11 // Query Supabase directly — bypasses the API entirely 12 cy.task('getSingleProductFromDb', { productId: id, userId: user_id }) 13 .then((dbRecord) => { 14 expect(dbRecord).to.not.be.null; 15 expect(dbRecord.name).to.eq(res.body.productList.name); 16 expect(dbRecord.deleted_at).to.be.null; 17 }); 18 }); 19 }); 19 lines
The cy.task() call runs in Node.js context and queries Supabase directly using the service role key. There is no mock, no fixture, no stub — the data either made it to the database or it didn't.
For soft deletes, the verification is equally direct:
javascript
1 it('should soft-delete: deleted_at must be set in DB', () => { 2 cy.request('DELETE', `${C.endpoints.inventory.delete}/${productId}`) 3 .then((res) => { 4 expect(res.status).to.eq(200); 5 6 cy.task('getSingleProductIncludeDeletedFromDb', { productId, userId }) 7 .then((dbRecord) => { 8 expect(dbRecord.deleted_at).to.not.be.null; 9 }); 10 }); 11 }); 11 lines
The Supabase query layer is organized in two tiers:
Tier 1 — Domain-specific tasks (preferred):
javascript
1 cy.task('getSingleProductFromDb', { productId, userId }) 2 cy.task('getProductWithQuantityFromDb', { productId, userId }) 3 cy.task('getLatestProductHistoryFromDb', { productId, userId }) 4 cy.task('getTotalProductsFromDb', { userId }) 4 lines
Tier 2 — Generic raw query (for uncovered cases):
javascript
1 cy.task('supabaseRawQuery', { 2 table: 'product_list', 3 select: 'id, name, quantity', 4 filters: { id: productId, user_id: userId }, 5 single: true, 6 }) 6 lines
The rule: if supabaseRawQuery is used for the same query more than twice, it gets promoted to a named domain task. This keeps the codebase from accumulating ad-hoc DB access spread across test files.
What the Tester Agent Actually Reviews
Code review is the part of this that surprised me most. Before writing any test, the Tester Agent reads both the Frontend and Backend output and looks for:
On the Backend:
On the Frontend:
This isn't just "run the tests and see what breaks." It's a defensive read of the implementation before any test is written — looking for what should be tested that the developer might have missed.
The Definition of Done
A feature isn't done when the developer says it's done. It's done when all five agents sign off:
| Agent | Done When |
|---|---|
| PM | PRD updated, acceptance criteria written |
| UI/UX | All states defined, design handed off |
| Backend | All endpoints live, edge cases handled |
| Frontend | UI built, all data-testid added and registered |
| Tester | Component audit done, endpoint audit done, E2E tests written and passing, reports updated |
The Tester Agent's sign-off is the last gate. If coverage drops or a test fails, the feature doesn't ship.
Coverage as a Living Document
One thing the agent system enforces: coverage reports are cumulative and never overwritten from scratch. The Tester Agent reads the existing coverage-report.md before each update, then appends new entries and recalculates totals.
Current state after the full pipeline ran across both domains:
| Module | Features | Automated | Coverage |
|---|---|---|---|
| Auth | 9 | 8 | 89% |
| API Auth Guard | 3 | 3 | 100% |
| Inventory Dashboard | 10 | 10 | 100% |
| Inventory Product | 16 | 16 | 100% |
| Trading (all) | 19 | 19 | 100% |
| Total | 76 | 75 | 99% |
The one gap — Google OAuth UI flow — is documented with a reason (requires real browser redirect to Google's servers) and a remediation priority (P2, use a mock OAuth provider). No gap is just silently ignored.
What This Changes About Development
The impact shows up in a few specific ways:
data-testid that would normally be caught during a test run (and require a code change, re-run cycle) is now caught during component audit before any test is written.The Honest Tradeoffs
This setup is not free.
The combination of Claude agents + MCP doesn't replace the Cypress test suite. It makes the test suite more complete, more consistent, and cheaper to maintain — because the agent handles the cognitive overhead of what to test and where the gaps are, not just how to write the assertion. The tests still run against real code, a real database, and real API responses. MCP connects the AI layer to the actual state of the system — so when the agent says "this data was persisted correctly," it checked.