How Slop Happens

Casey Newton coined the term "AI slop" in 2024 to describe the flood of AI-generated content spreading across the internet: abundant, superficially competent, and utterly devoid of intent. The term stuck because it named something people were already feeling.

In software, slop takes a different form. It isn't obviously bad code. Obviously bad code gets caught in review. Slop is the code that passes review because each individual change is defensible. It's the tenth reasonable decision in a row that produces a system nobody can explain.

This is what makes AI-assisted slop different from the slop that existed before. The volume is higher, the pace is faster, and the individual outputs look clean. The degradation is architectural, not syntactic. It happens at the level of the whole, not the part.

The most common vector is acceptance without understanding. A developer asks the model for a solution, gets something plausible, and moves on. Done once, this is fine. Done repeatedly across a codebase, it produces architecture nobody designed. Patterns that contradict each other. Abstractions that made sense in one context, then got replicated in five others where they don't. Six months later, someone asks why a service works the way it does and nobody knows. Not because the code is undocumented, but because nobody ever actually decided it should work that way.

ThoughtBot framed this precisely: the tooling starts to control the narrative if you let it. You end up unable to explain your own architecture decisions because you never really made them. You just clicked accept.

Research presented at CHI 2025 found a significant negative correlation between AI tool usage and critical thinking among knowledge workers, a pattern the researchers described as cognitive offloading. The ease of generation diminishes the depth of evaluation. That's the mechanism. Clicking accept isn't laziness. It's a habit the tooling actively encourages.

A second failure mode is subtler: side quest overproduction. Lower friction to prototype means it's easy to spin up something new whenever the main problem feels hard. A developer hits a blocker, pivots to a tangential idea, and builds a working proof of concept in two hours. Multiply that across a team and you get a lot of impressive demos and a main mission that's drifting.

The constraint that used to slow this down was implementation cost. Building something took enough time that you had to decide it was worth building. When that cost drops close to zero, the decision discipline has to come from somewhere else. It doesn't appear automatically.

There's a subtler version of this problem that plays out at industry scale. A 2024 study in Science Advances found that while generative AI increased individual creative output, it significantly reduced collective diversity. People using AI produced work that was more polished but more convergent, gravitating toward the same patterns and solutions. The same dynamic shows up in codebases. AI-generated code tends toward whatever is most common in training data. Every team independently reaching for the same tool ends up with architectures that look alike, make the same tradeoffs, and carry the same blind spots.

Measurement incentives are worth naming because organizations are already falling into this trap. Tracking AI usage, commit volume, or features shipped per sprint: any metric that captures output rather than outcome creates pressure to optimize for the metric. Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. Engineers who produce more get rewarded. Whether what they produced holds together is harder to measure, so it gets measured less.

The teams getting this right are measuring what they always should have: system reliability, time to resolve incidents, test coverage, how long it takes a new engineer to get productive. AI doesn't change those metrics. It just makes them matter more.

The through-line is that slop is a process failure, not a tool failure. The model does what you let it do. Review catches what the reviewer knows to look for. Architecture stays coherent when someone maintains a view of the whole. None of that happens automatically. It requires the same discipline AI was supposed to free us from, which is why teams that skip it find they've traded one kind of technical debt for a faster-accumulating version of the same thing.

Part 8 of 14 — What I Think About AI Engineering**

← Strategy in the Age of LLM Wrappers The Moat Shifts to Judgment →