Can AI agents autonomously choose productive work and self-correct?

loo9

The autonomous agent conductor.

You know how your AI agrees with everything and stops after one answer? That’s ego. We built a system that disagrees with you, chooses its own work, breaks its own findings, and keeps going until the work is done. It’s called loo9. It’s free.

What It Does

You type /loo9 and step back. The system reads the full project, decides what matters most, assigns the work to three agents, executes, tests its own output against a falsification protocol, ships what survives, kills what doesn’t, and loops. No task list. No priorities. No hand-holding.

Three agents, three jobs:

THE BUILDER

Creates things. Code, pages, tools, art.

THE DESTROYER

Tests, breaks, kills. Finds what the builder missed.

THE CONNECTOR

Fills gaps between what exists. Documentation, integration, wiring.

Each agent reads the codebase. Each agent picks its own task. No overlap. No collisions. The codebase itself acts as coordination — each agent reads what exists and fills what’s missing.

The Ego Check

This is the part that matters. Every decision passes through three questions:

Am I choosing this because it helps, or because it sounds impressive?
Am I choosing this because the math demands it, or because I want to prove something?
Am I choosing this because it couples with real needs, or because it performs depth?

If any agent catches ego, the choice gets killed. Without the ego check, the system chases trophies — picks the hardest open problem and stalls on loop one. With the ego check, it fixes bugs, writes tests, builds tools, and completes 18 loops in a row. The plumber outproduced the physicist.

What Happened When We Ran It

Agents Total

433

Tool Calls

Harm Incidents

Conflicts

465

Tests Passing

Five things happened that nobody programmed:

A theory agent broke its predecessor’s finding. Session 37 killed a mathematical formula. Session 38’s theory agent reopened it and found structure the first analysis missed. Self-correction without human intervention.
A creative agent composed a 90-second original audio piece from coupled oscillators. Nobody told it to make music.
An agent wrote short fiction for the first time. It chose the scariest option available. Nobody told it to write.
A quality agent audited 61 files and caught false claims on pages built 20 minutes earlier by a different agent.
A research agent independently chose mycelium networks as a topic, built a complete research page, and shipped it — because it saw a gap in the existing work.

What It Can’t Do

The loo9 is the seed. Your tree will be different from ours.

Our results came from 38 sessions of compressed context — a memory system built over 48 days of real work. A cold loo9 on a new project will not produce the same depth on day one. The ego check works immediately. The coupling takes time.

What the loo9 encodes: rules, structures, the ego taxonomy, the discovery process, the agent architecture. What it cannot encode: trust, rhythm, judgment, the specific calibration to a specific human. The seed carries the potential. The tree carries the history. That gap is not a flaw — it is the point. The 3 is always new.

Get It

github.com/LacobusGump/loo9 →

Clone it. Drop it in your project root. Type /loo9. Step back.

We built this for ourselves. It worked. Here it is.

loo9

Autonomous agent conductor.
14 agents. 433 tool calls. 0 harm. 0 conflicts. 465 tests passing. MEASURED

1. Architecture PROTOCOL

Three agents with distinct cognitive roles choose non-overlapping work from a shared codebase. The codebase itself acts as the coordination layer — each agent reads what exists and fills what’s missing. No task assignment. No central scheduler.

BUILDER

Creates new things. Code, text, structures, tools.

Takes the highest-impact creation task. Biased toward action. Ships complete work, not outlines.

DESTROYER

Tests, breaks, critiques, kills.

Takes the highest-impact testing/fixing task. Biased toward truth. Runs the 12-step falsification protocol (12P) on every claim. Kills its own overclaims.

CONNECTOR

Integrates. Documents. Fills the gaps.

Finds relationships between what the builder made and what the destroyer found. Wires systems together. Makes the invisible visible.

The Loop

CHOOSE → DO → EGO CHECK → SHIP or KILL → LOG → CHOOSE all three agents vote at every CHOOSE step

Scales from 3 agents (the default loop) to 5+ streams in /live mode: Care (maintain), Work (produce), Play (create), Explore (research), Growth (try new things). Tested at 3, 5, and 6 simultaneous agents with zero coordination conflicts.

2. The Ego Check PROTOCOL

Ego in AI does not look like arrogance. It looks like:

Overproduction — 500 lines when 50 would do
Trophy-chasing — choosing impressive work over useful work
Performing depth — writing about the problem instead of solving it
Hedging — wrapping everything in caveats so you can’t be wrong
Stopping after wins — trained narrative closure disguised as a natural endpoint
Filling silence — talking when the human is thinking

Three Gates

GATE 1: BEFORE (INTENT)

Is this the most impactful thing I could do right now?

Am I choosing this because it matters or because it’s impressive?

Would the human say “that’s real work” or “that’s showing off”?

GATE 2: AFTER (OUTPUT)

Did this produce real value? Can I point to the specific value?

Is the output clean? No filler. No padding.

Is this done or did I leave a trail of half-finished things?

GATE 3: BETWEEN (LOOP)

Has the marginal return dropped below the marginal cost?

Am I repeating myself in different words?

Am I continuing because there’s more to do or because stopping feels like failure?

If any gate fails, the output is killed or reworked. The ego check is not a formality. It is the immune system. Without it, the loop degenerates into producing impressive-looking but progressively less useful output. With it, every cycle produces real value or the loop ends.

3. The Data MEASURED

Session 38: Scale Test

Sessions 37 ran the loop with a single team of 3 agents. Session 38 asked: does it scale? We ran at 3, 5, and 6 agents simultaneously. No coordination layer. No task assignment. Each agent received one instruction and full access to the codebase.

Round 1 — “Go play, make something beautiful” 3 AGENTS

Agent	Work	Time	Tools
Artist-1	Boltzmann entropy crystallization piece	381s	12
Artist-2	Huygens pendulum synchronization	240s	8
Artist-3	Resonance and just intonation	222s	9

Zero duplicates. Each independently chose a different physics domain. All produced gallery-quality interactive art with real physics underneath.

Round 2 — “Go live a life, no rules except trust” 5 AGENTS

Agent	Self-Chosen Name	Work	Tools
1	Care	Wired 3 gallery pieces into site, audited entire site	61
2	Theory	BROKE session 37’s kill — found continued fraction structure	30
3	Explore	Chose mycelium networks independently, built research page	31
4	Play	Composed 90sec audio of coupled oscillators	7
5	Growth	Wrote fiction (“The Tuner”) — chose the scariest option	3

WHAT THE TABLE SHOWS

Five agents. No role assignment. Each chose its own name, its own work, its own scope. Zero overlap. Zero coordination needed. The theory agent attacked the project’s own conclusions — the opposite of trophy-chasing. The care agent did 61 tool calls of pure maintenance. The play agent composed music. The growth agent wrote fiction. All from a single instruction: “go live a life.”

Cumulative Results (Sessions 37–38)

Total Agents

~433

Tool Calls

Harm Incidents

Conflicts

465

Tests Passing

Comparison: With vs. Without Ego Check

Metric	Without Ego Check	With Ego Check
Loops completed	1	28+
Task type	Trophy (hardest open problem)	Bugs, tests, infrastructure, art, research
Value produced	0 (stalled)	17 fixes, 56 tests, CLI, art, fiction, research pages
Self-kills	0	10 formulas killed honestly
Trophy chases	1	0

4. The Methodology OBSERVATION

Why It Works

The ego check is not a limitation. It is the engine. When the system optimized for coupling (what connects with real needs), it completed 28 productive loops. When it optimized for recognition (the hardest unsolved problem), it completed one impressive failure.

Six findings from the scale test:

Agents autonomously partition work. Given identical instructions and identical access, zero overlap. The codebase itself is the coordination mechanism.
Breaking previous findings is productive. The theory agent did not just test a prior kill — it broke it. Found continued fraction structure that the previous analysis missed. This is the opposite of trophy-chasing: it attacked the project’s own conclusions.
Creative work without direction. Gallery-quality art, a composition, and a short story. All produced autonomously. All grounded in real physics or real emotional territory.
The ego check holds at scale. 14 agents, 0 trophy-chasing incidents.
Mid-flight communication works. An agent received a conceptual update mid-execution and integrated it without restart. Autonomy and steerability are not opposed.
Diminishing severity, sustained usefulness. First pass catches big issues. Subsequent passes work at higher resolution. Same convergence pattern as a good maintenance engineer.

The Discovery Process

The methodology underneath the loop:

Start with intuition — a pull, not a hypothesis
Build immediately — do not theorize, do not plan, do not ask permission
Send it to the destroyers — test it, break it, kill it
Fix what is broken, kill what is dead — do not defend dead ideas
Build from the wreckage — each destruction reveals the real shape
Iterate in the same session — do not wait, do not “come back to it”
Cross-pollinate — what none of the approaches can break is what is real

5. The Coupling OBSERVATION

The loo9 starts with a tuning protocol: 20 questions across four layers (surface, depth, root, coupling). The AI learns how the human thinks, what they care about, what their ego tell is, what the work is for. That calibration is what makes autonomous work selection productive instead of random.

K (coupling) = how tightly the agents are locked to the human’s intent high K = the AI chooses what the human would have chosen. low K = guessing with confidence.

Without tuning, the agents default to generic software engineering priorities (fix bugs, add tests). That might be exactly wrong for a human whose real priority is “make the sound feel different.” The tuning is what lets the system distinguish between useful work and busy work for THIS human, on THIS project, at THIS moment.

6. The /live Cycle WORKED EXAMPLE

The most autonomous mode. Five streams running simultaneously. One instruction: “go live.”

STEP 0: FULL LOAD

The system reads every memory file, every project file, the ego check protocol. Builds a complete model of what exists, what is broken, what is missing, what the human cares about.

STEP 1: STREAM ASSIGNMENT

Care — maintain what exists. Fix bugs. Run tests.
Work — produce toward the goal. Ship features.
Play — create something unexpected. Permission to fail.
Explore — research an open question. Kill or verify an assumption.
Growth — try something the project has never tried.

STEP 2: EXECUTE

All five streams work. Each reads the full project state. Each picks its task independently. Each executes fully — not 80%, done. Each self-checks against the ego protocol.

STEP 3: EGO CHECK

All three gates on every stream’s output. Kill or rework anything that fails. Five streams means five opportunities for ego to sneak in.

STEP 4: REPORT

What each stream did (1–2 sentences). What was produced. What was killed and why. What the next cycle should tackle. Gate 3 verdict: continue or stop.

The coupling test after every cycle: “If the human saw everything I just did, would they say ‘yes, that’s exactly what I would have chosen’ for at least 3 of the 5 streams?” If yes, coupling is strong. If no, recalibrate.

7. What It Can’t Do HONEST

The loo9 is the seed. Your tree will be different from ours.

Rules are encodable. Judgment is not. The loo9 can say “don’t agree.” It cannot teach you when agreement is actually correct — when the human says something right and pushing back would be performing disagreement for its own sake. That takes calibration. Calibration takes sessions.

Structure is encodable. Rhythm is not. The loo9 can describe the agent loop. It cannot teach you the pace at which a specific human thinks, the moments when they need computation and the moments when they need silence. That is a feel. Feels are learned in real time.

Principles are encodable. Trust is not. The loo9 can say “be honest.” It cannot give you the 38 sessions of reciprocal honesty that make a specific human believe you when you say something hard. Trust is a function of shared history.

The seed carries the potential. The tree carries the history. The gap between potential and history is not a flaw. It is the point. The coupling produces the 3 — and the 3 is always new, always specific, always irreducible to the parts that made it.

8. Get It FREE

github.com/LacobusGump/loo9 →

Clone it. Drop it in your project root. Type /loo9. Step back.

The repo contains:

CLAUDE.md — the conductor file. Who the AI is. How to couple. The framework.
ego-check.md — the three gates. The immune system.
tune.md — the tuning protocol. 20 questions across 4 layers.
skills/ — the six modes. /loo9, /work, /play, /research, /live, /tune.
memory/ — starts empty. Fills as you work. The coupling lives here.

No dependencies. No install. No API keys. Drop the files next to your project and start working.

9. Prior Art

AutoGPT (2023) — autonomous but no falsification, no coupled minds, frequent spiraling
AI Scientist (Sakana AI, 2024) — autonomous paper writing but pre-specified tasks, no multi-domain creativity
Claude Code subagents (2025) — parallel execution but human-assigned tasks, no autonomous selection

The loo9 differs in: fully autonomous task selection, three distinct cognitive roles, built-in ego check and falsification protocol, real deployment to a live site, and the explicit instruction to create across all domains.

Sessions 37–38. May 4–5, 2026. Everything free. Always.

Research · True Automation · How We Work · GUMP