Chad

robi

"Honesty doesn't change. Quality is relative."

Chad is a local AI coding agent with a verifier loop you can't bypass.

He runs on your machine. He talks to whatever model you have loaded — qwen3:14b, gemma3, codestral, or a frontier API. After every step of the work, he runs the tests. On failure, he feeds the failure back to the model and tries again. When he's stuck, he stops and tells you he's stuck.

He doesn't lie about progress.

Install

macOS / Linux:

pip install chad-engine

From source:

git clone https://github.com/usechad/chad.git
cd chad
pip install -e .

You'll also need Ollama running locally with a model pulled. Recommended starting point:

ollama pull qwen3:14b
ollama serve   # runs at localhost:11434

Chad's verifier loop works with any Ollama-compatible model. We've benchmarked qwen3:14b heavily; gemma3:27b, qwen2.5-coder:14b/32b, codestral:22b, and ministral-3:14b also work.

Quickstart

chad

That opens Chad's interactive shell. Try:

build a single-page snake game in vanilla HTML/CSS/JS

Chad will:

Ask a few clarifying questions (discovery layer)
Restate what he heard before writing code (design summary)
Decompose the work into phases (planner)
Build each phase, run the tests, retry on failure (verifier loop)
Halt honestly if a phase deadlocks instead of shipping broken code
Emit a plain-English receipt of what worked and what didn't

What makes Chad different

Six things, none of them are taglines:

Verifier loop you can't skip. Every phase runs checks. If checks fail, Chad retries. He doesn't lie about it.
Discovery before code. Chad asks clarifying questions before he plans. Most agents skip this step. Most builds suffer for it.
Design summary checkpoint. Chad restates what he heard before writing a line. You catch misreads in 30 seconds, not 30 minutes.
A receipt with every project. Every Chad project ships with a plain-English file describing what works, what doesn't, and what was skipped. No hidden caveats.
Self-repair on a worktree. Chad can fix his own bugs. He shows you the fix on a separate copy first; you say yes before it goes live.
Calibrated confidence. "I'm 87% on web apps, 34% on Tauri." If Chad doesn't know your domain well, he tells you upfront — before you spend an hour.

Receipts

We benchmarked Chad's verifier loop against bare models on the standard public coding benchmarks. Reproducible, runtime-graded.

HumanEval+ (qwen3:14b, 2026-05-08):

Configuration	Pass rate
bare `qwen3:14b`	78.66% (129/164)
Chad w/ `qwen3:14b`	92.68% (152/164)

Δ = +14.02 percentage points · 23 problems rescued · 0 regressions · 12 honest halts.

A free 14B local model wrapped in Chad's verifier loop crosses into frontier-API territory on HumanEval+. No API quota, no per-call cost, no code leaving the machine.

Full reproducible benchmarks: usechad/chad-bench Site: usechad.dev

How the verifier loop works (the concept, in pseudocode)

for attempt in range(MAX_ATTEMPTS):
    code = model.generate(prompt)
    passed, error = run_tests(code)
    if passed:
        return code
    if same_error_class_three_times_in_a_row(error):
        halt_honestly(error)             # stops; tells you it's stuck
        return None
    prompt = with_diagnosis(prompt, error)   # feed failure back, retry

That's the whole idea. The engineering is in the test battery, the diagnosis prompt geometry, the deadlock detector, and the per-model calibration. The loop itself is small. The full implementation lives in dante/build/runner.py and dante/verifiers/.

What Chad isn't

Not a model. Chad is a harness. It uses the model you already have.

Not a cloud service. Chad runs locally. Your code, your prompts, your model weights — none of it crosses the wire.

Not finished. This is v0.4. Things will change. The verifier loop, discovery layer, and planner are stable. The self-repair branch logic is still maturing. The HTTP API surface is functional but not yet feature-complete. Read CHANGELOG.md before assuming a feature is final.

Not magic. Chad helps the model finish what it starts on tasks where running tests gives useful feedback. On tasks where there's no good "test" — fabricating citations, writing prose, recall benchmarks — Chad adds little. Use the right tool for the job.

Honest limits

Per Chad's own self-assessment:

The verifier loop's leverage depends entirely on the test battery's coverage. Edge cases not in the test set don't get caught.
The diagnostician model occasionally repeats the same hint when it should escalate to a different strategy. The deadlock detector mitigates but doesn't fully solve this.
The planner can decompose a task in ways that hide ambiguity (grouping multiple steps into one). When that happens, the verifier can't separate which sub-step actually failed.
Discovery layer scope is bounded by what the discovery prompt asks. Hidden assumptions outside its scope still slip through.

These are real. We're working on each of them. We're also publishing about them rather than papering over them.

See ARCHITECTURE.md for the deeper design notes.

Hardware

Chad has been developed and benchmarked on consumer hardware. The HumanEval+ run above was produced on:

GPU:  NVIDIA RTX 5090 Laptop (24 GB VRAM)
RAM:  64 GB DDR5
OS:   Pop!_OS 24.04 LTS

Smaller GPUs work too. The constraint is whether the model you want to load fits in VRAM. qwen3:14b in 4-bit quant fits comfortably in 12 GB.

License

Apache 2.0. See LICENSE.

You can use Chad commercially, modify it, ship it inside your product, and keep your own changes private. Keep the copyright notice and license text intact when redistributing.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Chad — local AI coding agent harness with a verifier loop you can't bypass.