What are AI loops? Let AI check and refine its own work

One prompt at a time, versus a loop

The usual way to use AI is a back-and-forth: you ask, it answers, you spot what is wrong, you ask again. You are steering every single step. A loop flips that around. You give the goal once, and the model runs the whole cycle itself, plans an approach, does the work, checks the result against the goal, and tries again if it is not there yet. You stop managing each step and start managing the outcome.

“You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.” Peter Steinberger, on X

What a loop actually is

A prompt is a single instruction. A loop is a goal the model keeps working toward until it is complete. Instead of giving one answer and waiting for your next command, it runs the full cycle on its own:

Discover:work out what actually needs doing.
Plan:decide how to do it.
Execute:do the work.
Verify:check the result against the goal.
Iterate:not there yet? feed the result back in and go again.

Five steps, but three of them carry the whole thing, and they are exactly where people go wrong.

The three pieces that make it work

Verify is the core. Without a real check on its output, a model just produces something and approves its own homework. Progress only happens when there is a true gate, a test, a measurable metric, or a strict rubric, that the work has to pass. No gate, no loop, just expensive repetition.

State is what stops it going in circles. The loop has to remember what it already tried, so each round builds on the last instead of restarting from zero. That memory is what turns repetition into progress. (It is the same idea as agent memory.)

A stop condition keeps it under control. Every loop needs a clear definition of success and a hard limit, or it will happily run, and spend, forever. So the real difference is this: a prompt gives an instruction; a loop gives a job, a way to judge success, a memory of progress, and a rule for when to stop.

Do you even need one?

Most write-ups explain how loops work and skip the more useful question: when are they not worth it? A loop only earns its complexity when all four of these are true:

The task repeats, so the setup cost pays off over time.
There is an automatic way to reject bad output, a test, a rule, or a validator.
The agent can run end to end without you stepping in constantly.
Success is objective, so “done” can be decided without a judgment call.

Miss any one and a loop becomes over-engineering. For a one-off task, or anything where “good” is a matter of taste, a single well-structured prompt is the better tool. Loops are powerful, but only when the problem actually fits them.

The version built for code

Loops caught on in software first, because code is easy to verify: a test either passes or it does not, so there is no argument about success. A coding loop is just a goal plus a strict way to check it:

GOAL      every test in tests/auth passes, lint is clean, no type errors
EACH ROUND
  1. run the test suite and read every failure
  2. pick the single highest-impact failure
  3. write the smallest change that fixes it
  4. re-run the tests, lint, and the type checker
VERIFY    green tests, zero lint warnings, zero type errors
STOP WHEN verify passes, or 8 rounds reached
ON STOP   summarize what changed and what still fails

That is the whole shape, a clear goal, a tight per-round routine, an unambiguous verify gate, and a hard stop so a stuck loop gives up instead of burning money. It is the working pattern behind tools like AI coding agents.

Hands-on: build your own self-checking loop

Fill these in and we assemble a loop prompt you can paste into any chat model. It keeps working and grading itself until it meets your bar, no code required.

The task what you want produced Success criteria one per line, be strict Hard limit so it cannot run forever

rounds

Your loop prompt

What a real loop is built from

Under the hood, a production loop combines five parts, and modern agent tools (Claude Code, Codex, and the like) bundle all of them:

Automation:it runs on a schedule or trigger; you define the goal once and it keeps going without manual restarts.
Skill:reusable instructions saved as a stable set of rules, so every run behaves consistently instead of leaning on you to re-prompt.
Sub-agents:one agent does the work, another checks it; that separation is what stops a model rubber-stamping itself.
Connectors:integrations (tool calling, MCP servers) that let the loop take real action, not just suggest it.
Verifier:the strict gate (a test, a build, a rubric) that decides whether the output is acceptable. This is the part that makes progress real.

The cost nobody mentions

Loops run on tokens, and tokens are money. The catch is not any single step, it is how the cost compounds: every round re-sends the goal, the work so far, and all the previous failures, so the context grows and each pass costs more than the last. Add a second model to verify and you roughly double it, because both read the same expanding context.

Treat numbers here as orders of magnitude, not promises, they vary widely by model and task: a single agent on one medium job might run anywhere from tens of thousands to a couple hundred thousand tokens, growing each pass, and a fleet running in parallel multiplies all of it. The metric that actually matters is cost per accepted change, not tokens or rounds. When more than roughly half the model’s output gets rejected, the loop is mostly shifting work into review instead of saving you time. Loops can also fail silently, running and spending on incomplete output, unless a strict gate can stop bad results. That is why real loop systems need tight budgets, hard caps, and monitoring, and why, for most people most of the time, a simpler setup wins.

The order that actually works

If you do build one, the sequence matters more than the tools. Systems that hold up in production are almost always built in this order:

Get one manual run reliable first.
Turn that into a skill (save the instructions).
Wrap the skill in a loop (add the verify gate and the stop condition).
Then put it on a schedule.

Skip to step four and you have automated something that was never reliable to begin with. Earn each step before the next.