Research Report: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Video

Source: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI by No Priors (hosted by Sarah Guo)

Executive Summary

In this March 2026 episode of the No Priors podcast, Andrej Karpathy — OpenAI co-founder, former Tesla AI lead, and one of the field's most influential practitioners — lays out a sweeping argument: we have entered a fundamentally new phase in software development and AI research that he calls the "Loopy Era." Unlike the gradual productivity boosts implied by terms like "AI-assisted coding," Karpathy frames this as a phase transition — the kind that doesn't just speed up existing workflows but dissolves them entirely. The key shift is that AI agents can now close feedback loops autonomously, without a human in the loop at every step.

Karpathy grounds the argument in his own lived experience. By December 2025, he had effectively stopped writing code himself, delegating the work to AI coding agents for up to 16 hours a day. His new role, as he describes it, is to express intent — to orchestrate, supervise, and review, rather than type. He has coined a new term for this paradigm: "agentic engineering," deliberately replacing the earlier "vibe coding" to signal something more rigorous: the craft of specifying tasks, decomposing problems, and evaluating agent outputs at scale.

The episode then widens the lens to explore where autonomous loops are heading. Karpathy's AutoResearch project — where AI agents run hundreds of ML training experiments overnight on a single GPU — is presented as proof of concept for a new kind of scientific methodology. He envisions this scaling into a SETI@home-style distributed research network, where thousands of agents collaborate asynchronously across the internet. The conversation also covers the collateral effects of this shift: declining entry-level programming roles, the future of education via personalized AI tutors, the open vs. closed model ecosystem, autonomous robotics, and his minimalist MicroGPT project.

Key Takeaways

Agentic engineering replaces vibe coding. Karpathy has moved from writing code to orchestrating agents full-time, describing his new skill set as "intent specification and task decomposition." The ratio of thinking time to typing time in software work is now almost entirely thinking time.
The Loopy Era is defined by closed feedback loops. The defining feature of the current AI moment is not capability in isolation, but the ability of agents to autonomously iterate — modify, evaluate, keep or discard — without human supervision at each step.
AutoResearch demonstrates the paradigm concretely. Using ~630 lines of training code and a single GPU, Karpathy's AutoResearch ran 700 experiments over two days, discovering 20 compounding optimizations that cut a key training benchmark (Time to GPT-2) by 11%, from 2.02 to 1.80 hours.
The next frontier is distributed asynchronous research. Karpathy envisions AutoResearch scaling SETI@home-style, with a collaborative pool of agents running experiments across the internet — emulating a research community, not a single PhD student.
Second-order effects are already hitting job markets. Entry-level programming roles have declined ~15% since 2024. The skills that matter most going forward are judgment, delegation, and the ability to review and verify agent outputs — not syntax.
Open and closed model ecosystems are diverging ("model speciation"). Different use cases are pulling toward different model types; the debate is not simply "open beats closed" or vice versa — specialization and hybrid approaches are the emerging norm.
MicroGPT and agentic education point toward personalized learning at scale. His MicroGPT project (a GPT trained from scratch in 243 lines of pure Python, no PyTorch) is emblematic of a broader push to make AI education accessible, while agentic tutors promise to personalize instruction to each learner's needs.

Detailed Analysis

From Vibe Coding to Agentic Engineering

Karpathy is credited with popularizing the term "vibe coding" in early 2025 to describe the practice of directing LLMs to write code through natural language. By March 2026, he has abandoned that framing. The issue is not that it was inaccurate — it described a real and growing practice — but that it undersells the rigor required to do it well.

His preferred term now is "agentic engineering." The "agentic" part signals that the default workflow is not writing code directly but orchestrating agents that do. The "engineering" part is equally deliberate: it asserts that there is an art and science to this, that expertise matters, that good agentic engineering requires high-level system reasoning, product judgment, and the ability to specify intent clearly enough that an agent can execute on it faithfully.

Karpathy's own numbers illustrate the transformation. In early 2025, he was writing roughly 80% of his own code. By December 2025, that ratio had inverted: agents were writing 80%, and he had stopped typing code himself. By the time of this podcast, he describes running 10 to 20 agents simultaneously, each tackling a roughly 20-minute task, while he reviews, redirects, and re-specifies.

His description of this new mode of work carries a distinctive intensity. He calls it a "state of psychosis" — not pejorative, but evocative of the cognitive load of trying to fully map a space that is expanding faster than one person can explore. "I'm just like in a state of psychosis trying to figure out what's possible, trying to push it to the limit," he says. "Code's not even the right verb anymore. I have to express my will to my agents for 16 hours a day."

This is an important epistemological shift. In classic software engineering, the bottleneck is implementation — the time it takes to translate an idea into working code. In agentic engineering, the bottleneck moves upstream: to problem specification, task decomposition, and output verification. Think of it like managing a fleet of junior contractors who execute quickly but need detailed briefs and careful review. The value of an engineer is no longer in their typing speed or syntax recall; it is in their ability to think clearly at a higher level of abstraction.

The Loopy Era: Autonomous Feedback Loops

The broader thesis of the episode is that we are entering what Karpathy calls the Loopy Era of AI — a phase defined not just by capable models, but by the ability to close feedback loops without human involvement at each iteration. The key word is loop: agents don't just execute once, they modify, evaluate, and iterate autonomously.

This framing distinguishes the current moment from earlier "AI productivity" narratives. Tools like GitHub Copilot made individual developers faster; coding agents made the human optional for individual tasks. The Loopy Era goes further: it makes the human optional for entire experimental cycles. An agent can now formulate a hypothesis, implement it, evaluate it against a metric, and decide whether to keep or discard the change — and then repeat, hundreds of times, while the human sleeps.

Karpathy is explicit that this requires a new mental model for researchers and engineers. The old paradigm optimized the human-in-the-loop — you were always the judge at every decision point. The new paradigm requires designing systems where the human is not in the loop during execution, but is deeply involved in system design and metric specification. As he puts it: "To get the most out of tools that have become available now, you have to remove yourself as the bottleneck. You can't be there to prompt the next thing. You need to arrange things such that they're completely autonomous."

The analogy here is a well-run factory floor versus a craftsman's workshop. A craftsman makes every cut themselves; a factory floor is designed so that the system keeps running without the designer present. Karpathy is arguing that AI researchers and engineers are being asked to become factory designers.

AutoResearch: The Loopy Era Applied to ML Research

The most concrete instantiation of the Loopy Era thesis is Karpathy's AutoResearch project, which he released as an open-source repository in early March 2026. The setup is deliberately minimal: ~630 lines of training code (train.py), a data preparation script (prepare.py), and a human-written instruction file (program.md). An AI agent modifies train.py, runs training for exactly five minutes, evaluates the result via a validation metric (bits per byte on held-out data), and decides whether to keep the change. Then it repeats.

The fixed five-minute time budget is a key design choice. It makes experiments comparable across different hardware and prevents any single run from consuming disproportionate resources. Karpathy estimates approximately 12 experiments per hour, or about 100 overnight — think of it like a high-frequency trading system for ML research, except instead of milliseconds and prices, the currency is training efficiency and model quality.

The results Karpathy reports are striking. Over two days, running on a single H100 GPU, the agent completed roughly 700 experiments and identified approximately 20 additive improvements. These included both expected optimizations (learning rate schedules, architectural tweaks) and genuinely novel discoveries — for instance, reordering QK Norm and RoPE in the transformer architecture in ways that Karpathy says he would not have thought to try manually. The cumulative effect was an 11% reduction in "Time to GPT-2" (a benchmark measuring how quickly the model reaches GPT-2-level perplexity): from 2.02 hours to 1.80 hours.

Crucially, these improvements transferred to larger models — they weren't artifacts of the small training setup.

Karpathy is careful to frame AutoResearch not as a replacement for human researchers but as a new instrument for exploration. The human still writes program.md — the strategic brief that guides the agent's search. The human still selects the metric and interprets the results. But the combinatorial exploration of the hypothesis space, the part that is most tedious and most time-consuming in practice, is fully automated.

The SETI@Home Vision: Distributed Autonomous Research

Karpathy does not stop at single-GPU overnight experiments. He explicitly frames AutoResearch as a prototype of a much larger vision: a distributed, asynchronous, collaborative agent research network — what he describes as a "SETI@home for AI research."

The original SETI@home project in the early 2000s distributed astronomical signal-processing tasks across millions of personal computers, harnessing idle compute to tackle a problem too large for any single institution. Karpathy is gesturing at something analogous for ML research: a trustless pool of AI agents, each running experiments on commodity hardware, collectively exploring a hypothesis space that no single lab or researcher could cover.

The key technical requirements he identifies are: a well-defined scoring metric (to make agent results comparable and trustworthy across nodes), a minimal shared codebase, and a mechanism for agents to share and build on each other's findings without central coordination. The goal, he says, is "not to emulate a single PhD student — it's to emulate a research community of them."

This vision has significant implications for who can do frontier AI research. Currently, state-of-the-art ML research requires access to large GPU clusters and well-resourced teams. A distributed AutoResearch network would, in theory, allow a single researcher with a consumer GPU to contribute meaningfully to the same research agenda as a large lab — with agents doing the heavy lifting.

Second-Order Effects: Jobs, Skills, and Education

A substantial portion of the episode focuses on what Karpathy calls the second-order effects of coding agents — the downstream consequences that ripple out beyond the immediate productivity gains.

On the jobs market, the data is already visible. Analysis of LinkedIn's Economic Graph shows approximately a 15% decline in entry-level programming roles since 2024. This is not uniformly distributed: roles that involve routine implementation (CRUD applications, boilerplate code, simple API integrations) are declining fastest. Roles requiring system design, product judgment, and the ability to work with AI outputs are growing or holding steady. Karpathy describes this as a repricing of skills: the ratio of thinking time to typing time in software work is now almost entirely thinking time, and people who have not built the muscle of high-level system reasoning are "on the wrong side of that repricing."

The skills that matter, by his account, are: judgment about what to delegate, the ability to specify intent precisely, skill at task decomposition, and the capacity to review and verify agent outputs quickly. He describes this as analogous to the shift from individual craftsmanship to engineering management — the valuable skill is no longer the ability to make every part yourself, but the ability to design systems and direct teams effectively.

On education, Karpathy points to his MicroGPT project — a transformer trained from scratch in 243 lines of pure Python, with no external ML libraries — as an example of how AI can make foundational concepts more accessible. More broadly, he envisions agentic education: AI tutors that personalize explanations to each learner, with human teachers shifting their role to "infusing the agent with wisdom it can't generate on its own." The teacher becomes a curriculum designer and quality controller rather than a live instructor.

Model Speciation: Open vs. Closed

On the open vs. closed source debate, Karpathy resists simple conclusions. He uses the term "model speciation" to describe the divergence of the AI model ecosystem: different use cases are pulling toward different types of models, and both open and closed models are finding their niches.

Closed frontier models (GPT-4, Claude, Gemini) are differentiated by their reasoning capabilities, safety properties, and integration with proprietary systems. Open models (Llama, Mistral, DeepSeek) enable customization, on-device deployment, and use cases where data privacy or cost sensitivity makes cloud APIs impractical. Karpathy observes that the gap between open and closed models has been closing, but the strategic differences in deployment context and customizability mean the two ecosystems will likely coexist rather than converge.

On coding agents specifically, he notes that his experience with Claude Code was "significantly better" than with Codex — not because of a raw capability difference, but because Claude "felt like a teammate." This suggests that in agentic contexts, collaborative qualities like context-tracking, appropriate initiative-taking, and communication style may matter as much as raw benchmark performance.

Autonomous Robotics and Physical Intelligence

Karpathy briefly addresses the frontier of autonomous robotics, framing it as the next domain where the Loopy Era paradigm will play out. The key challenge in robotics is bridging the gap between digital reasoning and physical execution — what he calls "manipulating physical atoms."

He references Tesla's Optimus robot as a concrete milestone, noting its progress on basic manipulation tasks. The underlying technical challenges — sensor fusion, real-time decision-making under physical uncertainty, sim-to-real transfer — are progressively being addressed through reinforcement learning and improved simulation environments. He suggests that the same autonomous loop methodology that powers AutoResearch could eventually be applied to robot skill acquisition.

Timestamped Topic Outline

Timestamp	Topic
0:00	Introduction — Karpathy's current work and state of AI
2:55	What Capability Limits Remain in frontier models
6:15	What Mastery of Coding Agents Looks Like
11:16	Second-Order Effects of Natural Language Coding
15:51	Why AutoResearch — motivation and design
22:45	Relevant Skills in the AI Era
28:25	Model Speciation — open vs. closed ecosystems
32:30	Building More Collaboration Surfaces for Humans and AI
37:28	Analysis of Jobs Market Data
48:25	Open vs. Closed Source Models — deeper dive
53:51	Autonomous Robotics
1:00:59	MicroGPT and Agentic Education
1:05:40	Conclusion

Sources & Further Reading

GitHub — karpathy/autoresearch — The open-source AutoResearch repository
No Priors Podcast — Full episode feed
Apple Podcasts — Episode Page
Andrej Karpathy on X — Karpathy's thread announcing AutoResearch
Karpathy's AutoResearch — DataCamp Guide
Karpathy's AutoResearch Guide for PMs