Improve an Agent

Next we’re going to improve our agents using Claude Code. The biggest advantage of having a unified agent platform is that coding agents can read logs to iteratively improve our agents. The codebase comes with two prompts:

docs/improve-agent.md. Claude derives probes from the agent’s instructions, judges responses, and edits until they pass. Autonomous.
docs/extend-agent.md. Add a tool, refine a prompt, fix a bug. User-driven.

Both edit agents/<slug>.py directly. Hot-reload picks up the change in ~2s, so the test → judge → edit cycle is fairly tight.

Improve: autonomous probe-and-judge

Open Claude Code in your agent-platform directory and paste:

Run docs/improve-agent.md

Claude reads the target agent’s INSTRUCTIONS and derives 8–12 probes across four categories: golden path, edge cases, tool selection, and adversarial. For each probe, it cURLs the live container, reads tool calls from the logs, and judges PASS or FAIL against what the instructions promise. For every failure, Claude picks a lever and edits: tighten a rule, add a rule, swap a tool, bump num_history_runs. Claude re-runs only the failed probes. Caps at five iterations.

Extend: user-driven changes

When you have a specific change in mind, paste:

Run docs/extend-agent.md

Claude asks what to change. You describe a tool to add, a prompt to refine, a bug to fix. The agno-docs MCP is loaded so toolkit research is grounded in the real API. Each iteration is one small, verified change.

When to run each

Just created an agent and want to harden it before deploying. Use improve.
Users report the agent is missing the point. Use improve.
You want to add a new tool or knowledge base. Use extend.
You hit a specific bug. Use extend.
You just extended an agent and want to make sure nothing regressed. Use improve again.

Run your platform on Railway →

Welcome

Get Started

Use Cases

Features

Improve an Agent

Improve: autonomous probe-and-judge

Extend: user-driven changes

When to run each

Next

Welcome

Get Started

Use Cases

Features

Documentation Index

​Improve: autonomous probe-and-judge

​Extend: user-driven changes

​When to run each

​Next

Improve: autonomous probe-and-judge

Extend: user-driven changes

When to run each

Next