MCPClaudeAutomationAI Multi Agent

AI-Assisted QA: How I Use Claude Agents + MCP to Automate the Testing Pipeline

AI-Assisted QA: How I Use Claude Agents + MCP to Automate the Testing Pipeline

In a previous article, I covered how I built a Cypress E2E suite with 1,434 automated test cases for a personal management app. That article was about the tests themselves — patterns, structure, auth strategy, coverage. This one is about what happens before those tests get written. The part most developers skip: who decides what to test, who reviews the implementation for gaps, who notices that a data-testid is missing before the test even runs? In this project, that's a Claude agent — a specialized AI QA engineer running as part of a multi-agent development pipeline, connected to the real database via MCP.

the implementation

The Setup: A Multi-Agent Development Team

The app is built by five specialized Claude agents, each with a defined role:

PM → UI/UX + Backend (parallel) → Frontend → Tester

AgentRole
PMOwns requirements, updates PRD
UI/UXDesign decisions, component specs
BackendAPI routes, DB schema, services
FrontendReact components, UI states, data-testid
TesterCode review, E2E tests, coverage reports
OrchestratorCoordinates end-to-end feature delivery

The Tester Agent is always last in the pipeline — and it's the quality gate. Nothing is "done" until Tester signs off.

What MCP Adds to the QA Layer

The project has two key MCP integrations relevant to QA:

1. Supabase MCP — gives agents direct access to the database: query tables, inspect schema, run migrations, read logs. The Backend Agent uses this to verify schema state before writing code. The Tester Agent's DB verification tests (cy.task()) hit the same real Supabase instance — no mocking.

2. Claude Code as the runtime — agents run inside Claude Code, which means they can read files, grep the codebase, run Cypress headless, and write reports — all as part of one autonomous workflow.

Together, these mean the Tester Agent can:

  • Read the PRD to understand what should be built
  • Read the actual component JSX to find what was built
  • Query the database to verify what was persisted
  • Compare all three — and report the gaps

The Tester Agent's Kickoff Protocol

Every time the Tester Agent starts a session, it runs a fixed protocol before touching any test file:

  1. Read .claude/PRD.md → understand required behavior
  2. Read tester-agent-memory.md → recall known flaky tests, past bugs
  3. Read tester-knowledge.md → follow canonical test patterns
  4. Read shared-knowledge.md → cross-agent signal formats, Definition of Done
  5. Load cypress/fixtures/app-constants.json → verify all testIds and endpoints exist
  6. Check cypress/plugin/tasks/ → what DB query tasks already exist?
  7. Check pending-signals.md → any unresolved signals addressed to Tester?
  8. Start work

This is not just "read some files." This is an agent building a complete mental model of the system before writing a single assertion.

The PRD defines the expected behavior. The component files show the actual implementation. The gap between them is where bugs live — and where the agent looks first.

Cross-Agent Signals: The Tester as a Bug Reporter

One of the more interesting behaviors is how the Tester Agent communicates back to other agents when it finds problems.

All inter-agent communication goes through a single file: .claude/agents/signals/pending-signals.md. It's a structured inbox — every agent checks it at kickoff and resolves pending signals before starting new work.

When the Tester Agent audits a component and finds a missing data-testid:

javascript
1🔖 data-testid Request — Tester → Frontend
2Component: ProductsTable (app/main/inventory/product-list/list/ProductsTable.jsx)
3Missing IDs:
4 - product-table-row → the <tr> element for each product row
5 - product-stock-badge → the stock level indicator badge
6Needed for: cypress/e2e/inventory_management/product/product-list-ui.cy.js
7Action: Add data-testid attributes + register in cypress/fixtures/app-constants.yaml
7 lines

When the Tester Agent finds an endpoint missing an edge case:

javascript
1⚠️ Endpoint Gap — Tester → Backend
2Endpoint: POST /api/inventory/v1/product/create
3Missing edge case:
4 - Returns 200 instead of 400 when usage_quantity is a negative number
5Action: Add validation so tests can assert the correct behavior
5 lines

The signal sits in pending-signals.md as [PENDING]. The Frontend or Backend Agent picks it up at their next kickoff, makes the fix, and marks it [RESOLVED: YYYY-MM-DD].

This creates a tight feedback loop that would normally require a Jira ticket, a Slack message, and two meetings.

DB Verification: Querying Supabase Directly

The most concrete MCP integration in the QA layer is how the Tester Agent verifies data persistence.

The test doesn't just check the API response. It checks what's actually in the database:

javascript
1it('should persist data correctly to database', () => {
2 cy.request({
3 method: 'POST',
4 url: C.endpoints.inventory.create,
5 body: buildRequest(),
6 }).then((res) => {
7 expect(res.status).to.eq(201);
8
9 const { id, user_id } = res.body.productList;
10
11 // Query Supabase directly — bypasses the API entirely
12 cy.task('getSingleProductFromDb', { productId: id, userId: user_id })
13 .then((dbRecord) => {
14 expect(dbRecord).to.not.be.null;
15 expect(dbRecord.name).to.eq(res.body.productList.name);
16 expect(dbRecord.deleted_at).to.be.null;
17 });
18 });
19});
19 lines

The cy.task() call runs in Node.js context and queries Supabase directly using the service role key. There is no mock, no fixture, no stub — the data either made it to the database or it didn't.

For soft deletes, the verification is equally direct:

javascript
1it('should soft-delete: deleted_at must be set in DB', () => {
2 cy.request('DELETE', `${C.endpoints.inventory.delete}/${productId}`)
3 .then((res) => {
4 expect(res.status).to.eq(200);
5
6 cy.task('getSingleProductIncludeDeletedFromDb', { productId, userId })
7 .then((dbRecord) => {
8 expect(dbRecord.deleted_at).to.not.be.null;
9 });
10 });
11});
11 lines

The Supabase query layer is organized in two tiers:

Tier 1 — Domain-specific tasks (preferred):

javascript
1cy.task('getSingleProductFromDb', { productId, userId })
2cy.task('getProductWithQuantityFromDb', { productId, userId })
3cy.task('getLatestProductHistoryFromDb', { productId, userId })
4cy.task('getTotalProductsFromDb', { userId })
4 lines

Tier 2 — Generic raw query (for uncovered cases):

javascript
1cy.task('supabaseRawQuery', {
2 table: 'product_list',
3 select: 'id, name, quantity',
4 filters: { id: productId, user_id: userId },
5 single: true,
6})
6 lines

The rule: if supabaseRawQuery is used for the same query more than twice, it gets promoted to a named domain task. This keeps the codebase from accumulating ad-hoc DB access spread across test files.

What the Tester Agent Actually Reviews

Code review is the part of this that surprised me most. Before writing any test, the Tester Agent reads both the Frontend and Backend output and looks for:

On the Backend:

  • Missing auth guards on protected routes
  • Zod schemas that don't validate edge cases (empty string, negative numbers)
  • N+1 queries or missing pagination
  • Inconsistent API response format
  • Security issues: exposed service role keys, missing CSRF protection

On the Frontend:

  • Missing loading states and skeletons
  • Unhandled error responses from the API
  • Missing empty states
  • WCAG accessibility violations
  • Dead code, unused imports

This isn't just "run the tests and see what breaks." It's a defensive read of the implementation before any test is written — looking for what should be tested that the developer might have missed.

The Definition of Done

A feature isn't done when the developer says it's done. It's done when all five agents sign off:

AgentDone When
PMPRD updated, acceptance criteria written
UI/UXAll states defined, design handed off
BackendAll endpoints live, edge cases handled
FrontendUI built, all data-testid added and registered
TesterComponent audit done, endpoint audit done, E2E tests written and passing, reports updated

The Tester Agent's sign-off is the last gate. If coverage drops or a test fails, the feature doesn't ship.

Coverage as a Living Document

One thing the agent system enforces: coverage reports are cumulative and never overwritten from scratch. The Tester Agent reads the existing coverage-report.md before each update, then appends new entries and recalculates totals.

Current state after the full pipeline ran across both domains:

ModuleFeaturesAutomatedCoverage
Auth9889%
API Auth Guard33100%
Inventory Dashboard1010100%
Inventory Product1616100%
Trading (all)1919100%
Total767599%

The one gap — Google OAuth UI flow — is documented with a reason (requires real browser redirect to Google's servers) and a remediation priority (P2, use a mock OAuth provider). No gap is just silently ignored.

What This Changes About Development

The impact shows up in a few specific ways:

  1. Bugs surface earlier. When the Tester Agent reviews Frontend output immediately after it's written, missing validation UI and unhandled error states get caught before they ever hit a test run.
  2. Tests are more complete. The agent follows a fixed endpoint test matrix — happy path, auth, validation, boundary values, 404, 500, pagination. A human writing tests under time pressure might skip the boundary cases. The agent doesn't.
  3. The feedback loop is tighter. A missing data-testid that would normally be caught during a test run (and require a code change, re-run cycle) is now caught during component audit before any test is written.
  4. Coverage is auditable. Because the agent writes structured reports with explicit gap analysis, you always know exactly what's tested, what isn't, and why.

The Honest Tradeoffs

This setup is not free.

  1. It requires upfront investment in agent configuration. The Tester Agent is only as good as its knowledge file and memory. Defining the canonical test patterns, the endpoint matrix, the DB task conventions — that takes time and iteration.
  2. Agents need a real environment. No mocking the database means slower tests and real cleanup logic. The agent has to be disciplined about teardown, or test state leaks between runs.
  3. AI agents make mistakes. The generated tests need to be read, not just trusted. The agent occasionally writes an assertion that's technically correct but semantically wrong — asserting a field exists instead of asserting its value is correct. Reviewing agent output is part of the workflow.

The combination of Claude agents + MCP doesn't replace the Cypress test suite. It makes the test suite more complete, more consistent, and cheaper to maintain — because the agent handles the cognitive overhead of what to test and where the gaps are, not just how to write the assertion. The tests still run against real code, a real database, and real API responses. MCP connects the AI layer to the actual state of the system — so when the agent says "this data was persisted correctly," it checked.

cahya putra ugira portfolio