Concept Research
Key Concepts Identified
- Ralph Wiggum Technique: A simple bash loop approach to running AI coding agents iteratively until tests pass, named after the perpetually persistent Simpsons character
- Steve Yegge’s Gas Town: A complex multi-agent orchestration system with persistent state, roles, and coordination for managing 20-30+ parallel agents
- Beads: A git-backed, dependency-aware issue tracking system designed as external memory for coding agents
- Vibe Coding: Term coined by Andrej Karpathy describing fully AI-delegated programming where you “accept all” and don’t read diffs
- TDD with AI Agents: Using test-driven development as a constraint and validation mechanism for agentic coding workflows
- Context Window Management: Strategies for working within and around token limits in long-running agent sessions
- Parallel vs. Coordinated Agent Workflows: The spectrum from isolated parallel execution to fully orchestrated multi-agent systems
- Agentic Coding Experience Levels: Steve Yegge’s framework categorizing developers by their multi-agent proficiency (Stage 1-8)
Research Findings
The Ralph Wiggum Technique
Origins and Creator
The Ralph Wiggum technique was created by Geoffrey Huntley, a longtime open source developer who had shifted his life to raising goats in rural Australia. Around May 2025, Huntley hit a wall with then-standard “agentic” coding workflows and devised a radically simple solution.
Huntley wrote a 5-line Bash script that wrapped a model in a loop and pointed it at a task. He named it after Ralph Wiggum from The Simpsons - the character who is “perpetually confused, always making mistakes, but never stopping.”
The Core Implementation
In its purest form, Ralph is a Bash loop:
while :; do cat PROMPT.md | claude-code ; done
The technique exploits a key insight: progress doesn’t persist in the LLM’s context window - it lives in your files and git history. Each time the model produces output (success, failure, stack trace, or hallucination), the script feeds that entire output back in as fresh context, creating what Huntley calls a “contextual pressure cooker.”
Key Characteristics
- Stop Hook Pattern: When Claude Code tries to exit, a Stop hook blocks the exit and feeds the same prompt back in
- File-Based State: The files Claude just changed are still there, so each iteration builds on the last
- Completion Promise: You can cap the loop with max iterations or a “completion promise” token that Claude outputs only when the task is truly done
- Strong Feedback Loops: Works best with TypeScript, strong typing, and comprehensive unit tests - if code compiles and passes tests, emit completion; if not, try again
Philosophy
As Huntley explained: “That’s the beauty of Ralph - the technique is deterministically bad in an undeterministic world.” The approach embraces the chaos of AI code generation but channels it through the deterministic filter of compilation and tests.
Notable Results
- At a Y Combinator hackathon, Ralph “successfully generated 6 repositories overnight”
- One user ran a 14-hour autonomous session upgrading a codebase from React v16 to v19 without human input
- Huntley himself ran a loop for three consecutive months with one prompt: “Make me a programming language like Golang but with Gen Z slang keywords” - resulting in “Cursed,” a functional compiler with LLVM compilation, a standard library, and partial editor support
Evolution to Official Support
The technique went viral in late 2025. By that time, Boris Cherny, Anthropic’s Head of Developer Relations, formalized the hack into the official ralph-wiggum plugin for Claude Code, available via /plugin ralph.
Safety Considerations
- Always set
--max-iterations (e.g., 20 or 50) to prevent burning through API costs on impossible tasks
- Often requires
--dangerously-skip-permissions flag
- Security experts advise running Ralph sessions in sandboxed environments (disposable cloud VMs) to prevent accidental file deletion or system damage
Limitations
- Works best when success criteria can be verified programmatically (tests pass, lint clean, build succeeds)
- Rule: “if you can’t write success criteria without words like ‘nice’, ‘clean’, or ‘good’, the loop probably won’t converge”
- Best for “throwaway weekend projects” or well-defined maintenance tasks, not open-ended creative work
Steve Yegge’s Gas Town
Overview
Gas Town is a multi-agent orchestration system for Claude Code released in January 2026. It’s a Go-based orchestrator enabling developers to manage 20-30 parallel AI coding agents productively using tmux.
Yegge describes it as “an industrialized coding factory manned by superintelligent robot chimps” - a complex system for coordinating many agents with persistent work tracking.
Architecture Components
Workspaces: Your workspace directory (e.g., ~/gt/) contains all projects, agents, and configuration.
Rigs: Project containers - each rig wraps a git repository and manages its associated agents.
Slings: Ephemeral worker agents that spawn, complete a task, and disappear. They use git worktree-based persistent storage that survives crashes and restarts.
Beads: The atomic unit of work - a special kind of issue-tracker issue with ID, description, status, assignee, etc., stored in JSON and tracked in Git.
Agent Roles
Gas Town manages seven distinct agent roles across multiple project rigs:
- Mayor: A Claude Code instance with full context about your workspace, projects, and agents - the starting point where you describe what to accomplish
- Polecats: Scout/investigative agents
- Refinery: Processing and transformation agents
- Witness: Observation and logging agents
- Deacon: Quality and governance agents
- Dogs: Security and protection agents
- Crew: General worker agents
Experience Level Requirements
Critical insight: Gas Town explicitly targets “Stage 7-8” developers - people who already juggle five or more agents daily.
Yegge’s framework:
- Stage 6: Regularly using 3-5 parallel instances, very fast
- Stage 7: Using 10+ agents that are hand-managed, pushing limits of manual coordination
- Stage 8: Building your own orchestrator
“If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.”
User Experience
You offload tasks to the orchestrator agent (“The Mayor”) and it figures out planning, implementing, testing, and merging. The focus is throughput: creation and correction at the speed of thought.
Development Scale
In just 17 days, Gas Town accumulated 75,000 lines of code and 2,000 commits - entirely “vibe coded.” The project itself is proof-of-concept for the methodology it enables.
Capabilities
- Executes long-horizon tasks
- MAKER evaluations
- Multi-disc Tower of Hanoi runs (10-disc completed; 20-disc estimated ~30 hours)
Steve Yegge’s Beads
The Problem It Solves
Beads addresses the “50 First Dates” problem: coding agents wake up with no memory of what you did yesterday. When using markdown plans, the AI has no way of tracking the implicit execution stack - plans live in sibling markdown files with the same format, causing agents to “meander around intelligently but blindly” on ambitious tasks.
What Beads Is
Beads is external memory for agents with dependency tracking and query capabilities. The name comes from issues linked together by dependencies “like beads on a chain” that agents can follow to get tasks done in the right order.
It’s not “issue tracking for agents” - it’s a persistent, structured memory system that feels like “a reliable extension of working memory across sessions.”
Technical Implementation
- Git-backed: Everything stored in Git - archived issues stay in history, recoverable even after deletion
- Works offline: No external service dependency
- SQLite cache: Fast queries, but Git is source of truth
- Single Go binary: Install via
brew tap then pip install beads-mcp
- Claude Code plugin: Available in marketplace
How It Works in Practice
Agents have switched from markdown plans to the issue tracker exclusively. This grants “unprecedented continuity from session to session” - when you let agents use Beads, you never lose discovered work.
Best practice: Agents tackle one task at a time. Once done, kill the process and start a new agent. Beads acts as working memory between sessions. Starting new sessions often saves money and improves model performance.
Adoption
- Fully vibe-coded in six days
- 5,000+ GitHub stars, 250+ forks in ~2 months
- Agents use Beads “eagerly and enthusiastically with very little prompting”
- They make smart decisions like filing Beads when low on context instead of doing work directly
Vibe Coding
Origin
Computer scientist Andrej Karpathy (co-founder of OpenAI, former AI leader at Tesla) coined “vibe coding” on February 6th, 2025.
Definition
From Karpathy’s original tweet:
“There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good.”
He described his approach:
“I ‘Accept All’ always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension… Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away.”
Key Characteristic
A key part of the definition: the user accepts AI-generated code without fully understanding it.
This elaborates on Karpathy’s 2023 claim that “the hottest new programming language is English.”
Recognition
- Listed on Merriam-Webster as “slang & trending” one month after coining
- Named Collins English Dictionary Word of the Year for 2025
Relevance to This Post
Vibe coding represents one extreme of the spectrum - full delegation with minimal oversight. The author’s proposed approach sits between this extreme freedom and the heavy orchestration of Gas Town.
TDD in AI-Assisted Development
Core Value Proposition
TDD naturally breaks down complex problems into smaller, manageable, verifiable units, which aligns well with Claude’s agentic capabilities. By having a clear test suite as “source of truth,” Claude is less likely to drift off-topic or introduce unintended side effects.
The TDD-Agent Workflow
- Write Tests First: Explicitly ask Claude to write tests for functionality that doesn’t exist. Be explicit about TDD to prevent mock implementations
- Confirm Tests Fail: Instruct Claude to run tests and confirm they fail as expected - validating tests target non-existent functionality
- Implement Code: Ask Claude to write code that passes the tests without modifying them
- Iterate: “Tell Claude to keep going until all tests pass. It will usually take a few iterations”
Acceptance criteria defined by product owners (in Jira, Asana, etc.) provide “natural, clear, context-aware language” that is “optimal input to modern LLMs.” You can prompt: “Create a test suite with test stubs that map to these requirements.”
TDD Guard (GitHub: nizos/tdd-guard): Ensures Claude Code follows TDD principles. When agents try to skip tests or over-implement, it blocks the action and explains what needs to happen.
Best Practices
- Use CLAUDE.md to establish project-wide testing guidelines and quality standards
- Provide clear instructions: “Write a new unit test in tests/auth_test.py covering the logout edge case (user without session). Do not use any mocks”
- Combine TDD with other constraints in CLAUDE.md (coding standards, architectural patterns, security requirements)
The Key Insight
“Taming GenAI agents isn’t about limiting their capabilities - it’s about channeling them effectively. TDD provides the structure that transforms Claude Code from a code generator into a disciplined development partner.”
Token Limits and Context Management
Current Context Window Sizes
- Standard paid plans: 200K tokens (~500 pages)
- Enterprise plans (Sonnet 4.5): 500K tokens
- Beta (Tier 4 organizations): 1M tokens (Sonnet 4 and 4.5 only)
Automatic Context Management
Claude automatically manages conversation context on paid plans:
- When approaching limits, Claude summarizes earlier messages
- This doesn’t count toward usage limits
- Allows conversations to continue “indefinitely in most cases”
- Manual trigger:
/compact command
Extended Thinking Management
Previous thinking blocks are automatically stripped from context window calculation - you don’t need to strip them yourself. The API handles this.
Context Editing for Agents
Context editing automatically clears stale tool calls and results when approaching token limits. This preserves conversation flow while extending how long agents can run without manual intervention.
Impact: Combining memory tool with context editing improved performance by 39% over baseline. Context editing alone delivered 29% improvement.
Best Practices
- Avoid the final 20%: Performance degrades significantly when approaching limits
- Strategic session planning: All plans reset every 5 hours - plan intensive work around reset cycles
- Keep intermediate results out of context: Programmatic Tool Calling reduced average usage from 43,588 to 27,297 tokens (37% reduction)
- Use management commands:
/compact, /clear, /context (v1.0.86+)
Claude Max Considerations
Max plan users can use Opus 4.5 freely for complex tasks - roughly the same Opus tokens as previously had with Sonnet. Strategic developers plan intensive sessions around 5-hour reset cycles.
Parallel vs. Coordinated Agent Workflows
The Spectrum
Multi-agent workflows exist on a spectrum:
- Parallel (Independent): Multiple agents work simultaneously on separate tasks with no interaction
- Coordinated: Agents communicate, hand off work, or share state
- Orchestrated: Central system manages agent roles, task allocation, and workflow state
Parallel Agent Benefits
- “Dramatically speeds up workflows where tasks can be performed independently”
- Each agent has isolated context window - prevents information overload
- Sequential agents are single-threaded; parallel agents are multi-core
Best use cases:
- Data gathering
- Speeding up independent tasks
- Codebase exploration (e.g., “Explore using 4 tasks in parallel, each exploring different directories”)
- Code review: style-checker, security-scanner, test-coverage subagents simultaneously
Coordination Patterns
Three recurring patterns:
- Handoff-based: Specialized agents pass context between stages
- Parallel execution: Multiple agents work simultaneously, then combine results
- Sequential refinement: Agents process in stages, each building on previous output
When to Use Each
Parallel works when:
- Independent, resource-intensive tasks
- Sub-agents operate without dependencies
- Tasks can be performed concurrently
Coordinated/sequential better when:
- Early stages might fail or produce low-quality output
- AI agents need to collaborate rather than hand off
- Workflow requires backtracking or iteration
- Dynamic routing needed based on intermediate results
Critical Warning from Armin Ronacher
Ronacher’s experience: “Sub-tasks and sub-agents enable parallelism, but you must be careful. Tasks that don’t parallelize well - especially those mixing reads and writes - create chaos. Outside of investigative tasks, he doesn’t get good results.”
Design Principles
- Don’t create unnecessary coordination complexity when simple sequential/concurrent would suffice
- Avoid agents that don’t provide meaningful specialization
- Watch for latency impacts of multiple-hop communication
- Never share mutable state between concurrent agents - causes transactionally inconsistent data
Claude Code Sub-agents
Claude Code supports sub-agents natively:
- Each subagent has isolated context window
- Only relevant summaries return to main conversation
- Subagents cannot spawn other subagents
- Chaining agents in loops significantly increases token usage
Agentic Coding Best Practices (2025-2026 Landscape)
The Current State
By end of 2025, ~85% of developers regularly use AI tools for coding. The conversation has shifted from “help me write this function” to “build this feature while I review another PR.”
Context is Everything
“The main problem with AI agents is they have no context about the project. Every time you start a session, the agent is like a new software engineer who has never seen the codebase. That’s why context files are the solution.”
CLAUDE.md: Special file automatically pulled into context. Ideal for:
- Repository etiquette (branch naming, merge vs. rebase)
- Developer environment setup
- Unexpected behaviors or warnings particular to the project
Planning Before Coding
“Asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront, rather than jumping straight to coding.”
Armin Ronacher’s Key Recommendations
Ronacher uses Claude Code with Max subscription ($100/month), exclusively Sonnet model:
- Do “the dumbest possible thing that will work”: Prefer functions with clear, long, descriptive names over classes
- Avoid inheritance and clever hacks
- Use plain SQL: Agents write excellent SQL and can match it with logs
- Go for new backend projects: Context system, explicit nature, simple test caching favor agents
- Logging is crucial: Allows agents to complete flows (like email verification) without extra help
Memory and Session Continuity
Developers are frustrated by agents that forget between sessions. Solutions:
- Instruct agents to log sessions and update documentation
- Use Beads or similar external memory
- Start new sessions often (saves money, better performance)
- Beads acts as working memory between sessions
Security Considerations
- Use
permissions.deny in ~/.claude/settings.json to forbid sensitive file access
- Use
.cursorignore (like .gitignore) to make files invisible to agents
- Be attentive about data transmitted to LLMs
AI Adoption and “Meeting Engineers Where They Are”
Progressive Adoption Strategy
The most successful AI adoptions begin with structured experimentation rather than blanket tool rollouts.
Key findings:
- Leadership advocacy makes developers 7x more likely to be daily AI users
- Developers who received structured enablement reported significantly better results
- Teams left to figure things out on their own saw higher time loss and wider knowledge gaps
Onboarding Impact
Data from July-September 2025 at six multinational enterprises:
- Onboarding time cut in half (91 days with no AI to 49 days with daily use)
- “Users who start with AI start ahead, and stay ahead”
Champion-Based Rollout
- Start with small, enthusiastic “power users” as internal champions
- Choose engineers from different parts of the stack (backend, frontend, infra)
- Create dedicated Slack channels for AI adoption conversations
- Weekly “AI exploration sessions” for sharing discoveries
Psychological Safety
Some developers feel threatened by AI tools. Address directly by positioning AI as augmentation rather than replacement. The most effective model is partnership approach rather than whole delegation.
Focus Areas
Start with high-toil areas: manual, repetitive tasks that drain developer time. Don’t chase trends.
The ReAct Pattern: A Middle Ground Philosophy
The ReAct (Reason and Act) pattern provides a philosophical framework for middle-ground approaches:
“ReAct finds a middle ground, providing enough structure through reasoning while maintaining flexibility through iterative action.”
Two extremes it avoids:
- Pure planning: Figuring out all steps before any action - works with complete information but struggles with discovery
- Pure execution: Acting without forethought - fast but inefficient and error-prone
Benefits of the middle ground:
- When actions don’t yield expected results, reasoning phase allows diagnosis and adjustment
- Reasoning trail provides transparency for trust, debugging, and understanding decisions
- Neither over-planned nor chaotically executed
Relevant Resources
Ralph Wiggum
Steve Yegge’s Work
Agentic Coding Best Practices
TDD and AI
Context and Token Management
Multi-Agent Workflows
AI Adoption Strategy
Vibe Coding
Workflow Patterns
Connections and Patterns
The Complexity-Control Tradeoff
A clear spectrum emerges in agentic coding approaches:
-
Vibe Coding (Low Control, High Delegation): Accept all, don’t read diffs, let AI figure it out. Works for throwaway projects, not production code.
-
Ralph Wiggum (Simple Loop, Test-Based Control): Let the agent iterate freely, but gates on deterministic criteria (tests pass, builds succeed). Simple implementation, limited orchestration.
-
Single-Session with TDD (Moderate Control): Stay within token limits, use acceptance criteria, iterate until done. This is where the author’s approach sits.
-
Parallel Independent Agents (Scale Without Coordination): Multiple sessions working on different features, no cross-talk. More throughput, same simplicity.
-
Beads + Manual Multi-Agent (Persistent Memory, Manual Coordination): External memory system, ~5-10 agents, hand-managed coordination. Stage 6-7.
-
Gas Town (Full Orchestration): 20-30+ agents, seven distinct roles, automated coordination, persistent state. Stage 7-8 only.
Engineering Discipline as the Constant
Across all approaches, TDD and testable acceptance criteria emerge as the key differentiator between success and chaos:
- Ralph Wiggum only converges when “success criteria can be verified programmatically”
- Gas Town still relies on tests for validation
- Armin Ronacher emphasizes “strong feedback loops like TypeScript and unit tests”
- The phrase “if you can’t write success criteria without ‘nice’, ‘clean’, or ‘good’, the loop probably won’t converge” applies across the spectrum
Context as the Bottleneck
Token limits and context degradation create natural constraints:
- Ralph Wiggum bypasses context limits by using files and git as state
- Beads exists specifically to solve the “50 First Dates” memory problem
- Gas Town uses git worktrees for crash-resistant persistent state
- Performance degrades in the final 20% of context window
- Starting fresh sessions often is better than accumulating context
The “Stage” System as Adoption Guide
Yegge’s experience levels provide a useful framework:
- Most developers are Stage 3-5
- The author’s position (“not ready for Gas Town”) suggests Stage 5-6
- Parallel sessions without cross-collaboration is appropriate for Stage 5-6
- Gas Town is explicitly Stage 7+ only
Parallelization: Independence > Coordination
A strong pattern emerges from multiple sources:
- Armin Ronacher: “Outside of investigative tasks, I don’t get good results” from coordinated sub-agents
- Microsoft Azure: “Avoid agents that don’t provide meaningful specialization”
- Multiple sources warn against mutable shared state between concurrent agents
The insight: Parallel independent agents working on isolated features is more reliable than coordinated agents sharing state. This validates the author’s approach of “no cross-collaboration between sessions.”
Progressive Adoption Path
The author’s framing of “preparing engineers for Gas Town later” aligns with adoption research:
- Structure before freedom
- Champion-based rollout
- Start with high-toil, well-defined tasks
- “Users who start with AI start ahead, and stay ahead”
The Middle Ground Philosophy
The ReAct pattern provides philosophical grounding: “a middle ground, providing enough structure through reasoning while maintaining flexibility through iterative action.”
The author’s approach embodies this:
- Structure: TDD, acceptance criteria, validation loops
- Flexibility: Agent freedom within sessions, parallel exploration
- Neither pure planning (over-architected orchestration) nor pure execution (vibe coding chaos)
Key Tension: Simplicity vs. Capability
The fundamental tension in all approaches:
- Simplicity (Ralph, single-session): Easy to understand, easy to debug, limited scale
- Capability (Gas Town): Massive parallelization, complex coordination, steep learning curve, fragile
The author’s approach optimizes for sustainable capability: more output than simple approaches, but maintainable complexity that doesn’t require Stage 7 expertise.
Implications for the Blog Post
Unique Positioning
The author’s approach fills a genuine gap:
- More sophisticated than “just loop until tests pass” (Ralph)
- Less complex than “run an AI factory” (Gas Town)
- Grounded in engineering fundamentals (TDD, acceptance criteria)
- Accessible to Stage 5-6 developers (the majority)
Key Differentiators to Emphasize
- TDD as Control Mechanism: Not just a best practice, but the key to making agentic coding deterministic
- Token-Conscious Design: Working within limits rather than fighting them
- Independence Over Coordination: Parallel sessions without shared state
- Progressive Path: A stepping stone, not a destination
Potential Counterarguments to Address
- “Why not just use Gas Town?” - Stage requirements, complexity, learning curve
- “Why not just use Ralph?” - Need for more control, acceptance criteria, session management
- “Parallel without coordination seems limiting” - Independence is more reliable than coordination for most tasks
The author’s title reference to “middle ground” could be extended:
- Ralph = chaos theory (let entropy find solutions)
- Gas Town = factory automation (industrial-scale coordination)
- Author’s approach = craftsman’s workshop (disciplined, skilled, human-scale)
Authenticity Note
The author’s admission “I’m not sure I’m ready to run a gas town!” is valuable - it positions this as honest practitioner advice rather than theoretical best practices. The Stage framework validates this: most developers aren’t Stage 7, and pretending otherwise doesn’t help anyone.