04 — Testing

Rule: every project ships with tests. No test files in a PR = PR blocked. Prototypes included — one smoke test is enough there, but it exists.

We got burned before by “we’ll add tests later.” Later never came. The cost of writing a single happy-path test while you’re building is lower than any other point in the project’s life.

The pyramid for this team

Top-down:

Thin E2E on the 2–5 critical flows (auth, checkout, the main CRUD loop). Not every page.
Heavy integration — API route + real DB (Docker Postgres in CI). This is where most bugs live and where tests have the best ROI.
Light unit tests on pure logic (parsers, calculators, reducers). Skip trivial units — no “test getter returns value.”

Mocked-unit-heavy pyramids look tidy and miss real bugs. Integration first.

Required test files per project type

Every new project must ship with at least these, on day one:

Next.js app

tests/smoke.spec.ts (Playwright) — loads / and one auth’d page. Confirms the app boots.
tests/integration/*.test.ts — one test per API route / Server Action that touches the DB.
tests/unit/*.test.ts — for any non-trivial pure function.

Backend service

tests/smoke.test.ts — hits /health, confirms DB connection.
tests/integration/*.test.ts — one per endpoint, real DB.
Contract tests (Zod/Pydantic schemas) — verify every boundary validates.

Data / ML / AI

tests/smoke.test.ts — pipeline runs end-to-end on a fixture input.
evals/ — golden-set JSON + runner. See playbooks/data-ml-ai.

Mobile

__tests__/smoke.test.ts — app renders the root screen without crashing.
One Maestro flow for sign-in + home screen.

CLI / script

tests/smoke.test.ts — invoking the CLI with --help exits 0. One test per command.

Stacks

Language / runtime	Test runner	E2E	HTTP mocking	DB
TypeScript / Node / Next.js	Vitest	Playwright	MSW	Docker Postgres in CI via `testcontainers` or compose
Python	pytest + `pytest-asyncio`	(see above — test via HTTP)	httpx + `respx`	testcontainers-python (preferred) or `pytest-postgresql`
React Native	Jest + React Native Testing Library	Maestro	MSW	n/a

Don’t mix test runners in one project.

What to test (and what not to)

Test:

Anything that takes user input (form handlers, API routes) — validation + the happy path + one failure case.
Anything that writes to the DB — integration test with a real DB.
Pure functions with branching logic.
Critical flows end-to-end.
Bug fixes — a test that reproduces the bug before you fix it. Prevents regression.

Don’t test:

Framework behavior (you don’t need to test that Next.js returns a response).
Getters / setters / trivial mappers.
Third-party libraries.
Implementation details (internal function calls). Test behavior, not internals.

Coverage

No hard percentage target. Chasing a number produces useless tests.
Target: every critical path has a test. If a bug happens in prod, a test should have caught it — or adding one is part of the fix.

Flaky tests

Flaky > broken in terms of team damage. A green-ish suite trains people to ignore failures.
First flake: add a comment, re-run once.
Second flake on same test: tag @flaky, file a ticket, fix within the sprint, or delete the test.
Never merge a flaky test and forget it.

Running tests

Locally: pnpm test / pytest — should run in < 30s for unit + integration on a warm cache.
Watch mode during dev: pnpm test --watch.
Full suite (incl. E2E) in CI. E2E is slow; run it on PR and main only, not on every push.

Test data

Factories (e.g. @faker-js/faker + a tiny factory helper), not fixtures copy-pasted between tests.
Reset DB state between integration tests — truncate tables or use a transaction rollback per test.
Seed data in a prisma/seed.ts / drizzle/seed.ts / scripts/seed.py — runnable locally.

Reviewing tests in PRs

When reviewing, ask:

If I delete this test and keep the code, would a real bug slip through?
Does this test exercise behavior or just re-describe the implementation?
Is the failure message going to tell a future you what broke?

If any answer is “no,” ask for changes.