Chad — local AI coding agent harness with a verifier loop you can't bypass.
-
Chad
https://github.com/usechad/chad
"Honesty doesn't change. Quality is relative."
Chad is a local AI coding agent with a verifier loop you can't bypass.
He runs on your machine. He talks to whatever model you have loaded — qwen3:14b, gemma3, codestral, or a frontier API. After every step of the work, he runs the tests. On failure, he feeds the failure back to the model and tries again. When he's stuck, he stops and tells you he's stuck.
He doesn't lie about progress.
Install
macOS / Linux:
pip install chad-engineFrom source:
git clone https://github.com/usechad/chad.git cd chad pip install -e .You'll also need Ollama running locally with a model pulled. Recommended starting point:
ollama pull qwen3:14b ollama serve # runs at localhost:11434Chad's verifier loop works with any Ollama-compatible model. We've benchmarked qwen3:14b heavily; gemma3:27b, qwen2.5-coder:14b/32b, codestral:22b, and ministral-3:14b also work.
Quickstart
chadThat opens Chad's interactive shell. Try:
build a single-page snake game in vanilla HTML/CSS/JSChad will:
- Ask a few clarifying questions (discovery layer)
- Restate what he heard before writing code (design summary)
- Decompose the work into phases (planner)
- Build each phase, run the tests, retry on failure (verifier loop)
- Halt honestly if a phase deadlocks instead of shipping broken code
- Emit a plain-English receipt of what worked and what didn't
What makes Chad different
Six things, none of them are taglines:
- Verifier loop you can't skip. Every phase runs checks. If checks fail, Chad retries. He doesn't lie about it.
- Discovery before code. Chad asks clarifying questions before he plans. Most agents skip this step. Most builds suffer for it.
- Design summary checkpoint. Chad restates what he heard before writing a line. You catch misreads in 30 seconds, not 30 minutes.
- A receipt with every project. Every Chad project ships with a plain-English file describing what works, what doesn't, and what was skipped. No hidden caveats.
- Self-repair on a worktree. Chad can fix his own bugs. He shows you the fix on a separate copy first; you say yes before it goes live.
- Calibrated confidence. "I'm 87% on web apps, 34% on Tauri." If Chad doesn't know your domain well, he tells you upfront — before you spend an hour.
Receipts
We benchmarked Chad's verifier loop against bare models on the standard public coding benchmarks. Reproducible, runtime-graded.
HumanEval+ (qwen3:14b, 2026-05-08):
Configuration Pass rate bare qwen3:14b78.66% (129/164) Chad w/ qwen3:14b92.68% (152/164) Δ = +14.02 percentage points · 23 problems rescued · 0 regressions · 12 honest halts.
A free 14B local model wrapped in Chad's verifier loop crosses into frontier-API territory on HumanEval+. No API quota, no per-call cost, no code leaving the machine.
Full reproducible benchmarks:
usechad/chad-benchSite: usechad.dev
How the verifier loop works (the concept, in pseudocode)
for attempt in range(MAX_ATTEMPTS): code = model.generate(prompt) passed, error = run_tests(code) if passed: return code if same_error_class_three_times_in_a_row(error): halt_honestly(error) # stops; tells you it's stuck return None prompt = with_diagnosis(prompt, error) # feed failure back, retryThat's the whole idea. The engineering is in the test battery, the diagnosis prompt geometry, the deadlock detector, and the per-model calibration. The loop itself is small. The full implementation lives in
dante/build/runner.pyanddante/verifiers/.
What Chad isn't
Not a model. Chad is a harness. It uses the model you already have.
Not a cloud service. Chad runs locally. Your code, your prompts, your model weights — none of it crosses the wire.
Not finished. This is v0.4. Things will change. The verifier loop, discovery layer, and planner are stable. The self-repair branch logic is still maturing. The HTTP API surface is functional but not yet feature-complete. Read
CHANGELOG.mdbefore assuming a feature is final.Not magic. Chad helps the model finish what it starts on tasks where running tests gives useful feedback. On tasks where there's no good "test" — fabricating citations, writing prose, recall benchmarks — Chad adds little. Use the right tool for the job.
Honest limits
Per Chad's own self-assessment:
- The verifier loop's leverage depends entirely on the test battery's coverage. Edge cases not in the test set don't get caught.
- The diagnostician model occasionally repeats the same hint when it should escalate to a different strategy. The deadlock detector mitigates but doesn't fully solve this.
- The planner can decompose a task in ways that hide ambiguity (grouping multiple steps into one). When that happens, the verifier can't separate which sub-step actually failed.
- Discovery layer scope is bounded by what the discovery prompt asks. Hidden assumptions outside its scope still slip through.
These are real. We're working on each of them. We're also publishing about them rather than papering over them.
See
ARCHITECTURE.mdfor the deeper design notes.
Hardware
Chad has been developed and benchmarked on consumer hardware. The HumanEval+ run above was produced on:
GPU: NVIDIA RTX 5090 Laptop (24 GB VRAM) RAM: 64 GB DDR5 OS: Pop!_OS 24.04 LTSSmaller GPUs work too. The constraint is whether the model you want to load fits in VRAM. qwen3:14b in 4-bit quant fits comfortably in 12 GB.
License
Apache 2.0. See
LICENSE.You can use Chad commercially, modify it, ship it inside your product, and keep your own changes private. Keep the copyright notice and license text intact when redistributing.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login