Testing and verification
How do you actually know it works? The disciplines that turn hope into evidence.
- Describe the testing pyramid and what each level buys you
- Write a test that would genuinely catch a bug
- Use tests as the safety net for reviewing AI-generated code
"It works on my machine" is a hope, not a fact. Testing is how you turn hope into evidence — automated checks that prove your code does what you claim, and keep proving it as the code changes. In an era where AI agents generate code quickly, tests are the verification layer that makes that speed safe.
The testing pyramid
Tests come in levels, and a healthy mix looks like a pyramid:
- Unit tests (many, fast): check one small piece — a single function — in isolation. Cheap to write and run; they pinpoint exactly what broke.
- Integration tests (fewer): check that several pieces work together — that the seams between modules actually fit.
- End-to-end tests (fewest, slow): drive the whole system like a real user, confirming the entire flow works.
Lots of fast unit tests at the base, a few slow end-to-end tests at the top. This is where decomposition pays off again: small, well-named functions are exactly what's easy to unit-test.
Arrange, act, assert
A good test has a clear shape:
- Arrange — set up the inputs and the world.
- Act — call the thing you're testing.
- Assert — check the result is what you expected.
The art is choosing cases that would actually catch a bug: the empty list, the zero, the duplicate, the boundary. A test that only checks the easy, happy path gives false confidence — it passes whether or not the real edge cases work.
Tests as a specification
Written first or read later, tests describe what code is supposed to do, in runnable form. They're often the clearest documentation in a project (recall reading source code), and they let you refactor fearlessly: change the insides freely, and the tests tell you the moment behaviour drifts.
The safety net for AI-generated code
This is the practical heart of working with agents. An agent can write a function and its tests in seconds — but you decide what "correct" means. Read the tests: do they check the cases that matter, including the awkward ones? Run them. Add the case the agent missed. Behaviour passing your tests is real evidence; code that merely looks right is not.
A reliable loop with agents: agree on the behaviour, capture it as tests, then let the agent implement until the tests pass. The tests are the contract — and the thing you, not the agent, are responsible for getting right.
Where to go next
Testing tells you that something is wrong. Finding why is its own discipline: debugging. But first, the human side of agents — designing with AI agents.