AI-Native XP: A New Workflow Emerges

Kent Beck published Extreme Programming in 1999. The core idea was simple: if short feedback loops are good, make them as short as possible. If testing is good, do it first. If integration is good, do it continuously. Take every practice you already believe in and turn up the intensity.

XP gave teams small stories, test-driven development (write the failing test before writing the code that makes it pass), pair programming, trunk-based development, and merciless refactoring. It was a direct reaction to the bloated, plan-heavy methodologies of the era. It worked well for teams that committed to the discipline.

Most developers under 35 have never worked in an explicitly XP shop. But right now, many of them are rediscovering its practices without the name, under pressure from AI tooling.

The workflow that's actually producing results with AI, assembled from what engineers across teams have figured out:

Break work into small, task-specific prompts. One problem at a time. Reconstruct context for the next task rather than carrying a sprawling conversation forward.
Prompt with tests and usage examples. Show the model what "done" looks like before asking it to build.
Run, observe, adjust. Keep the loop tight. Don't queue up ten changes and review them together.
Commit every acceptable outcome. Hard reset when it goes sideways. Don't try to negotiate a broken session back to working state.
Merge to trunk. PR-based processes get overwhelmed by the volume AI can generate. A lighter integration cycle keeps things moving.
Trust deterministic sources of truth: the actual code, test output, linter results. Not what the model says the code does. What it actually does.
Keep separation of concerns clean. Smaller blast radius per change means less to review and less to undo.

Every item on that list maps directly to an XP practice. This isn't convergent evolution. Engineers are arriving at the same discipline because it solves the same problem: how to stay in control when output is fast and mistakes are cheap to make but expensive to accumulate.

The parallel that matters most is TDD. Prompting with tests isn't just a quality practice. It's the clearest way to specify intent. A test is unambiguous: it either passes or it doesn't. When you give a model a failing test and ask it to make it pass, you've given it a compiler target in the most literal sense.

The one XP practice that doesn't translate directly is pair programming. What replaces it is something different: a developer maintaining a birds-eye view of the system while the model handles implementation detail. The model can't hold the big picture reliably. It doesn't know how a change three files away affects what it's writing now. That responsibility stays with the person at the keyboard.

Naming this matters. Teams that stumble into these practices organically benefit from them but can't teach them, defend them under pressure, or improve them deliberately. "We work in small chunks and test everything" is a habit. "This is Extreme Programming, and here's why it works" is a framework someone can act on.

Beck figured out the forcing function in 1999. Then it was software complexity. Now it's AI velocity. The answer looks the same.

Part 6 of 14 — What I Think About AI Engineering**

← You Are the Compiler Operator Strategy in the Age of LLM Wrappers →