Code of the Day
AdvancedAdvanced Agent Patterns

Challenge: Build an automated workflow

Capstone challenge — build a complete CLI tool with an agent using phased orchestration, memory files, diff review, and a reflection on where the agent helped and where you had to intervene.

Challenge · optionalUsing AIAdvanced60 min
Recommended first
By the end of this lesson you will be able to:
  • Apply the full advanced-tier agentic workflow to a real, non-trivial task
  • Demonstrate phased orchestration across at least three agent interactions
  • Review every diff produced by the agent and document any corrections made
  • Produce a written reflection on the agent's contributions and its limitations

This challenge is a capstone for the entire Advanced tier. You will build a complete small system — a CLI tool that fetches data from a public API, processes it, and outputs a formatted report — using the agentic workflow you have learned across both modules.

There is no auto-grader for this challenge. The evaluation criteria are explicit and listed at the end. Work through each requirement carefully.


The system to build

A command-line tool that:

  1. Fetches the current weather for a city the user specifies
  2. Processes the response to extract: city name, temperature (Celsius), weather description, and wind speed
  3. Outputs a formatted text report to stdout
  4. Writes the same report to a file when --output <filename> is provided
  5. Handles errors gracefully: city not found, network failure, invalid arguments

API to use: Open-Meteo (https://open-meteo.com) — it is free, requires no API key, and has a simple JSON API. The geocoding endpoint is: https://geocoding-api.open-meteo.com/v1/search?name=<city>&count=1 The weather endpoint is: https://api.open-meteo.com/v1/forecast?latitude=<lat>&longitude=<lon>&current=temperature_2m,wind_speed_10m,weather_code


Evaluation criteria

Before you begin, read the evaluation criteria so you know what you are building toward. When you are finished, check each item.

Process criteria — how you worked:

  • You wrote a CLAUDE.md before starting any agent interactions
  • You ran at least three separate agent interactions (design, implement, test — not three turns in one session)
  • You reviewed every diff before continuing
  • The tool has at least one automated test
  • You documented every correction you had to make to agent output

Product criteria — what you built:

  • python weather.py London prints a formatted report
  • python weather.py London --output report.txt writes the report to the file
  • python weather.py (no arguments) prints a usage message and exits non-zero
  • python weather.py "not a real city 12345" handles the "city not found" case gracefully
  • pytest tests/ -v passes

Reflection criteria — what you learned:

  • You wrote a one-to-two paragraph reflection answering the three questions below

Step 1: Write your CLAUDE.md

This is not optional. Do it before opening an agent session.

Your CLAUDE.md should include:

  • What the project does
  • The language and Python version
  • How to run the tool and run the tests
  • Conventions you want followed (type hints, docstrings, error messages to stderr, output to stdout)
  • Libraries the agent is allowed to use (standard library plus requests or httpx for HTTP; pytest for tests)
  • What the agent should not do (add dependencies not listed above; modify the tests directory without asking)

Step 2: Phase 1 — Design the interface

Start a new agent session. Give the agent this instruction:

Design the public interface for a weather CLI tool. I need:

  1. The function signatures (with type annotations) for: fetching the location data, fetching the weather data, and formatting the report.
  2. The argument parser definition (what argparse arguments the tool accepts).

Output function signatures and argument definitions only — no implementation, no tests. I will review this design before we proceed.

Review the output. Ask yourself:

  • Are the function boundaries right, or would you structure them differently?
  • Do the type annotations make sense?
  • Does the argument parser cover all the required behaviours?

Edit the design if needed. This is the architectural checkpoint — fix problems here, not in the implementation.


Step 3: Phase 2 — Implement

Start a new agent session (or continue the same one). Give the agent:

Here is the interface design we agreed on: [paste the design from Phase 1].

Implement the tool in weather.py. Follow the function signatures exactly. Use requests for HTTP calls. All error messages go to stderr; output goes to stdout. Do not write tests yet.

Review the diff carefully using the red-flag checklist:

  • Is error handling present for network failures and "city not found" cases?
  • Are any values hardcoded that should come from the function arguments?
  • Does the implementation match the interface from Phase 1?

Step 4: Phase 3 — Test

Start a new agent session. Give the agent:

Here is the implementation in weather.py: [paste or reference weather.py].

Write tests in tests/test_weather.py. The tests should:

  1. Test count_words with mocked HTTP responses so they do not make real API calls (use unittest.mock.patch).
  2. Test the formatting function with fixed input data.
  3. Test that a missing city raises an appropriate error or returns an empty result.

Do not modify weather.py.

Review the tests. Do they actually test the right things? Are the mocks realistic? Does a test passing tell you something meaningful?

Run them:

pytest tests/ -v

If tests fail, apply the correction pattern from the lab: read the failure yourself first, then give the agent a precise correction instruction.


Step 5: Manual verification

Run the tool against a real city:

python weather.py London
python weather.py Paris --output paris.txt
python weather.py
python weather.py "zxqwerty123456"

Does each case behave correctly? Does the output look right? Does the error handling work?


Step 6: Reflection

Write one to two paragraphs answering these three questions:

  1. Where did the agent add value? What did it do well — quickly, accurately, in a way that saved you meaningful time?

  2. Where did you have to correct it? Be specific: what was wrong, and what correction did you provide? Include every correction, even small ones.

  3. What is the most important thing you would do differently on the next agentic project? This could be about the CLAUDE.md, the task scoping, the review process, or the phasing.

Write this in a file in the project (REFLECTION.md or a comment block in the code). The act of writing it is part of the exercise.


A note on "the agent does it for you"

The goal of this challenge is not to produce a working weather tool. You could copy a working implementation in ten minutes. The goal is to run the full agentic workflow yourself — with discipline — and come out of it having genuinely used every technique in this advanced tier.

An agent that generates the entire tool in one shot without a CLAUDE.md, without phasing, without diff review, and without reflection has not taught you anything you could apply on a project that matters. The constraints exist because the constraints are the lesson.


Where to go next

You have completed the Advanced tier of Using AI. You now have a concrete, practiced skill: not just using AI as a chat tool, but running an agentic development loop — from task scoping through direction, review, automation, and security. That loop is the durable skill. The specific tools will change; the loop will not.

Finished reading? Mark it complete to track your progress.

On this page