Julietta Yaunches

AI engineer & researcher at NVIDIA. (the opinions on this site are mine alone)

Concept Research

Key Concepts Identified


Research Findings

The Ralph Wiggum Technique

Origins and Creator

The Ralph Wiggum technique was created by Geoffrey Huntley, a longtime open source developer who had shifted his life to raising goats in rural Australia. Around May 2025, Huntley hit a wall with then-standard “agentic” coding workflows and devised a radically simple solution.

Huntley wrote a 5-line Bash script that wrapped a model in a loop and pointed it at a task. He named it after Ralph Wiggum from The Simpsons - the character who is “perpetually confused, always making mistakes, but never stopping.”

The Core Implementation

In its purest form, Ralph is a Bash loop:

while :; do cat PROMPT.md | claude-code ; done

The technique exploits a key insight: progress doesn’t persist in the LLM’s context window - it lives in your files and git history. Each time the model produces output (success, failure, stack trace, or hallucination), the script feeds that entire output back in as fresh context, creating what Huntley calls a “contextual pressure cooker.”

Key Characteristics

  1. Stop Hook Pattern: When Claude Code tries to exit, a Stop hook blocks the exit and feeds the same prompt back in
  2. File-Based State: The files Claude just changed are still there, so each iteration builds on the last
  3. Completion Promise: You can cap the loop with max iterations or a “completion promise” token that Claude outputs only when the task is truly done
  4. Strong Feedback Loops: Works best with TypeScript, strong typing, and comprehensive unit tests - if code compiles and passes tests, emit completion; if not, try again

Philosophy

As Huntley explained: “That’s the beauty of Ralph - the technique is deterministically bad in an undeterministic world.” The approach embraces the chaos of AI code generation but channels it through the deterministic filter of compilation and tests.

Notable Results

Evolution to Official Support

The technique went viral in late 2025. By that time, Boris Cherny, Anthropic’s Head of Developer Relations, formalized the hack into the official ralph-wiggum plugin for Claude Code, available via /plugin ralph.

Safety Considerations

Limitations


Steve Yegge’s Gas Town

Overview

Gas Town is a multi-agent orchestration system for Claude Code released in January 2026. It’s a Go-based orchestrator enabling developers to manage 20-30 parallel AI coding agents productively using tmux.

Yegge describes it as “an industrialized coding factory manned by superintelligent robot chimps” - a complex system for coordinating many agents with persistent work tracking.

Architecture Components

Workspaces: Your workspace directory (e.g., ~/gt/) contains all projects, agents, and configuration.

Rigs: Project containers - each rig wraps a git repository and manages its associated agents.

Slings: Ephemeral worker agents that spawn, complete a task, and disappear. They use git worktree-based persistent storage that survives crashes and restarts.

Beads: The atomic unit of work - a special kind of issue-tracker issue with ID, description, status, assignee, etc., stored in JSON and tracked in Git.

Agent Roles

Gas Town manages seven distinct agent roles across multiple project rigs:

  1. Mayor: A Claude Code instance with full context about your workspace, projects, and agents - the starting point where you describe what to accomplish
  2. Polecats: Scout/investigative agents
  3. Refinery: Processing and transformation agents
  4. Witness: Observation and logging agents
  5. Deacon: Quality and governance agents
  6. Dogs: Security and protection agents
  7. Crew: General worker agents

Experience Level Requirements

Critical insight: Gas Town explicitly targets “Stage 7-8” developers - people who already juggle five or more agents daily.

Yegge’s framework:

“If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.”

User Experience

You offload tasks to the orchestrator agent (“The Mayor”) and it figures out planning, implementing, testing, and merging. The focus is throughput: creation and correction at the speed of thought.

Development Scale

In just 17 days, Gas Town accumulated 75,000 lines of code and 2,000 commits - entirely “vibe coded.” The project itself is proof-of-concept for the methodology it enables.

Capabilities


Steve Yegge’s Beads

The Problem It Solves

Beads addresses the “50 First Dates” problem: coding agents wake up with no memory of what you did yesterday. When using markdown plans, the AI has no way of tracking the implicit execution stack - plans live in sibling markdown files with the same format, causing agents to “meander around intelligently but blindly” on ambitious tasks.

What Beads Is

Beads is external memory for agents with dependency tracking and query capabilities. The name comes from issues linked together by dependencies “like beads on a chain” that agents can follow to get tasks done in the right order.

It’s not “issue tracking for agents” - it’s a persistent, structured memory system that feels like “a reliable extension of working memory across sessions.”

Technical Implementation

How It Works in Practice

Agents have switched from markdown plans to the issue tracker exclusively. This grants “unprecedented continuity from session to session” - when you let agents use Beads, you never lose discovered work.

Best practice: Agents tackle one task at a time. Once done, kill the process and start a new agent. Beads acts as working memory between sessions. Starting new sessions often saves money and improves model performance.

Adoption


Vibe Coding

Origin

Computer scientist Andrej Karpathy (co-founder of OpenAI, former AI leader at Tesla) coined “vibe coding” on February 6th, 2025.

Definition

From Karpathy’s original tweet:

“There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good.”

He described his approach:

“I ‘Accept All’ always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension… Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away.”

Key Characteristic

A key part of the definition: the user accepts AI-generated code without fully understanding it.

This elaborates on Karpathy’s 2023 claim that “the hottest new programming language is English.”

Recognition

Relevance to This Post

Vibe coding represents one extreme of the spectrum - full delegation with minimal oversight. The author’s proposed approach sits between this extreme freedom and the heavy orchestration of Gas Town.


TDD in AI-Assisted Development

Core Value Proposition

TDD naturally breaks down complex problems into smaller, manageable, verifiable units, which aligns well with Claude’s agentic capabilities. By having a clear test suite as “source of truth,” Claude is less likely to drift off-topic or introduce unintended side effects.

The TDD-Agent Workflow

  1. Write Tests First: Explicitly ask Claude to write tests for functionality that doesn’t exist. Be explicit about TDD to prevent mock implementations
  2. Confirm Tests Fail: Instruct Claude to run tests and confirm they fail as expected - validating tests target non-existent functionality
  3. Implement Code: Ask Claude to write code that passes the tests without modifying them
  4. Iterate: “Tell Claude to keep going until all tests pass. It will usually take a few iterations”

Acceptance Criteria as Input

Acceptance criteria defined by product owners (in Jira, Asana, etc.) provide “natural, clear, context-aware language” that is “optimal input to modern LLMs.” You can prompt: “Create a test suite with test stubs that map to these requirements.”

Tools and Enforcement

TDD Guard (GitHub: nizos/tdd-guard): Ensures Claude Code follows TDD principles. When agents try to skip tests or over-implement, it blocks the action and explains what needs to happen.

Best Practices

The Key Insight

“Taming GenAI agents isn’t about limiting their capabilities - it’s about channeling them effectively. TDD provides the structure that transforms Claude Code from a code generator into a disciplined development partner.”


Token Limits and Context Management

Current Context Window Sizes

Automatic Context Management

Claude automatically manages conversation context on paid plans:

Extended Thinking Management

Previous thinking blocks are automatically stripped from context window calculation - you don’t need to strip them yourself. The API handles this.

Context Editing for Agents

Context editing automatically clears stale tool calls and results when approaching token limits. This preserves conversation flow while extending how long agents can run without manual intervention.

Impact: Combining memory tool with context editing improved performance by 39% over baseline. Context editing alone delivered 29% improvement.

Best Practices

  1. Avoid the final 20%: Performance degrades significantly when approaching limits
  2. Strategic session planning: All plans reset every 5 hours - plan intensive work around reset cycles
  3. Keep intermediate results out of context: Programmatic Tool Calling reduced average usage from 43,588 to 27,297 tokens (37% reduction)
  4. Use management commands: /compact, /clear, /context (v1.0.86+)

Claude Max Considerations

Max plan users can use Opus 4.5 freely for complex tasks - roughly the same Opus tokens as previously had with Sonnet. Strategic developers plan intensive sessions around 5-hour reset cycles.


Parallel vs. Coordinated Agent Workflows

The Spectrum

Multi-agent workflows exist on a spectrum:

  1. Parallel (Independent): Multiple agents work simultaneously on separate tasks with no interaction
  2. Coordinated: Agents communicate, hand off work, or share state
  3. Orchestrated: Central system manages agent roles, task allocation, and workflow state

Parallel Agent Benefits

Best use cases:

Coordination Patterns

Three recurring patterns:

  1. Handoff-based: Specialized agents pass context between stages
  2. Parallel execution: Multiple agents work simultaneously, then combine results
  3. Sequential refinement: Agents process in stages, each building on previous output

When to Use Each

Parallel works when:

Coordinated/sequential better when:

Critical Warning from Armin Ronacher

Ronacher’s experience: “Sub-tasks and sub-agents enable parallelism, but you must be careful. Tasks that don’t parallelize well - especially those mixing reads and writes - create chaos. Outside of investigative tasks, he doesn’t get good results.”

Design Principles

Claude Code Sub-agents

Claude Code supports sub-agents natively:


Agentic Coding Best Practices (2025-2026 Landscape)

The Current State

By end of 2025, ~85% of developers regularly use AI tools for coding. The conversation has shifted from “help me write this function” to “build this feature while I review another PR.”

Context is Everything

“The main problem with AI agents is they have no context about the project. Every time you start a session, the agent is like a new software engineer who has never seen the codebase. That’s why context files are the solution.”

CLAUDE.md: Special file automatically pulled into context. Ideal for:

Planning Before Coding

“Asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront, rather than jumping straight to coding.”

Armin Ronacher’s Key Recommendations

Ronacher uses Claude Code with Max subscription ($100/month), exclusively Sonnet model:

  1. Do “the dumbest possible thing that will work”: Prefer functions with clear, long, descriptive names over classes
  2. Avoid inheritance and clever hacks
  3. Use plain SQL: Agents write excellent SQL and can match it with logs
  4. Go for new backend projects: Context system, explicit nature, simple test caching favor agents
  5. Logging is crucial: Allows agents to complete flows (like email verification) without extra help

Memory and Session Continuity

Developers are frustrated by agents that forget between sessions. Solutions:

Security Considerations


AI Adoption and “Meeting Engineers Where They Are”

Progressive Adoption Strategy

The most successful AI adoptions begin with structured experimentation rather than blanket tool rollouts.

Key findings:

Onboarding Impact

Data from July-September 2025 at six multinational enterprises:

Champion-Based Rollout

  1. Start with small, enthusiastic “power users” as internal champions
  2. Choose engineers from different parts of the stack (backend, frontend, infra)
  3. Create dedicated Slack channels for AI adoption conversations
  4. Weekly “AI exploration sessions” for sharing discoveries

Psychological Safety

Some developers feel threatened by AI tools. Address directly by positioning AI as augmentation rather than replacement. The most effective model is partnership approach rather than whole delegation.

Focus Areas

Start with high-toil areas: manual, repetitive tasks that drain developer time. Don’t chase trends.


The ReAct Pattern: A Middle Ground Philosophy

The ReAct (Reason and Act) pattern provides a philosophical framework for middle-ground approaches:

“ReAct finds a middle ground, providing enough structure through reasoning while maintaining flexibility through iterative action.”

Two extremes it avoids:

  1. Pure planning: Figuring out all steps before any action - works with complete information but struggles with discovery
  2. Pure execution: Acting without forethought - fast but inefficient and error-prone

Benefits of the middle ground:


Relevant Resources

Ralph Wiggum

Steve Yegge’s Work

Agentic Coding Best Practices

TDD and AI

Context and Token Management

Multi-Agent Workflows

AI Adoption Strategy

Vibe Coding

Workflow Patterns


Connections and Patterns

The Complexity-Control Tradeoff

A clear spectrum emerges in agentic coding approaches:

  1. Vibe Coding (Low Control, High Delegation): Accept all, don’t read diffs, let AI figure it out. Works for throwaway projects, not production code.

  2. Ralph Wiggum (Simple Loop, Test-Based Control): Let the agent iterate freely, but gates on deterministic criteria (tests pass, builds succeed). Simple implementation, limited orchestration.

  3. Single-Session with TDD (Moderate Control): Stay within token limits, use acceptance criteria, iterate until done. This is where the author’s approach sits.

  4. Parallel Independent Agents (Scale Without Coordination): Multiple sessions working on different features, no cross-talk. More throughput, same simplicity.

  5. Beads + Manual Multi-Agent (Persistent Memory, Manual Coordination): External memory system, ~5-10 agents, hand-managed coordination. Stage 6-7.

  6. Gas Town (Full Orchestration): 20-30+ agents, seven distinct roles, automated coordination, persistent state. Stage 7-8 only.

Engineering Discipline as the Constant

Across all approaches, TDD and testable acceptance criteria emerge as the key differentiator between success and chaos:

Context as the Bottleneck

Token limits and context degradation create natural constraints:

The “Stage” System as Adoption Guide

Yegge’s experience levels provide a useful framework:

Parallelization: Independence > Coordination

A strong pattern emerges from multiple sources:

The insight: Parallel independent agents working on isolated features is more reliable than coordinated agents sharing state. This validates the author’s approach of “no cross-collaboration between sessions.”

Progressive Adoption Path

The author’s framing of “preparing engineers for Gas Town later” aligns with adoption research:

The Middle Ground Philosophy

The ReAct pattern provides philosophical grounding: “a middle ground, providing enough structure through reasoning while maintaining flexibility through iterative action.”

The author’s approach embodies this:

Key Tension: Simplicity vs. Capability

The fundamental tension in all approaches:

The author’s approach optimizes for sustainable capability: more output than simple approaches, but maintainable complexity that doesn’t require Stage 7 expertise.


Implications for the Blog Post

Unique Positioning

The author’s approach fills a genuine gap:

Key Differentiators to Emphasize

  1. TDD as Control Mechanism: Not just a best practice, but the key to making agentic coding deterministic
  2. Token-Conscious Design: Working within limits rather than fighting them
  3. Independence Over Coordination: Parallel sessions without shared state
  4. Progressive Path: A stepping stone, not a destination

Potential Counterarguments to Address

  1. “Why not just use Gas Town?” - Stage requirements, complexity, learning curve
  2. “Why not just use Ralph?” - Need for more control, acceptance criteria, session management
  3. “Parallel without coordination seems limiting” - Independence is more reliable than coordination for most tasks

Metaphor Opportunity

The author’s title reference to “middle ground” could be extended:

Authenticity Note

The author’s admission “I’m not sure I’m ready to run a gas town!” is valuable - it positions this as honest practitioner advice rather than theoretical best practices. The Stage framework validates this: most developers aren’t Stage 7, and pretending otherwise doesn’t help anyone.