Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps - Status | Demo | Docs | Install
  1. Cloudron Forum
  2. App Wishlist
  3. Chad — local AI coding agent harness with a verifier loop you can't bypass.

Chad — local AI coding agent harness with a verifier loop you can't bypass.

Scheduled Pinned Locked Moved App Wishlist
1 Posts 1 Posters 8 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • robiR Offline
    robiR Offline
    robi
    wrote last edited by
    #1

    Chad

    https://github.com/usechad/chad

    "Honesty doesn't change. Quality is relative."

    Chad is a local AI coding agent with a verifier loop you can't bypass.

    He runs on your machine. He talks to whatever model you have loaded — qwen3:14b, gemma3, codestral, or a frontier API. After every step of the work, he runs the tests. On failure, he feeds the failure back to the model and tries again. When he's stuck, he stops and tells you he's stuck.

    He doesn't lie about progress.


    Install

    macOS / Linux:

    pip install chad-engine
    

    From source:

    git clone https://github.com/usechad/chad.git
    cd chad
    pip install -e .
    

    You'll also need Ollama running locally with a model pulled. Recommended starting point:

    ollama pull qwen3:14b
    ollama serve   # runs at localhost:11434
    

    Chad's verifier loop works with any Ollama-compatible model. We've benchmarked qwen3:14b heavily; gemma3:27b, qwen2.5-coder:14b/32b, codestral:22b, and ministral-3:14b also work.


    Quickstart

    chad
    

    That opens Chad's interactive shell. Try:

    build a single-page snake game in vanilla HTML/CSS/JS
    

    Chad will:

    1. Ask a few clarifying questions (discovery layer)
    2. Restate what he heard before writing code (design summary)
    3. Decompose the work into phases (planner)
    4. Build each phase, run the tests, retry on failure (verifier loop)
    5. Halt honestly if a phase deadlocks instead of shipping broken code
    6. Emit a plain-English receipt of what worked and what didn't

    What makes Chad different

    Six things, none of them are taglines:

    1. Verifier loop you can't skip. Every phase runs checks. If checks fail, Chad retries. He doesn't lie about it.
    2. Discovery before code. Chad asks clarifying questions before he plans. Most agents skip this step. Most builds suffer for it.
    3. Design summary checkpoint. Chad restates what he heard before writing a line. You catch misreads in 30 seconds, not 30 minutes.
    4. A receipt with every project. Every Chad project ships with a plain-English file describing what works, what doesn't, and what was skipped. No hidden caveats.
    5. Self-repair on a worktree. Chad can fix his own bugs. He shows you the fix on a separate copy first; you say yes before it goes live.
    6. Calibrated confidence. "I'm 87% on web apps, 34% on Tauri." If Chad doesn't know your domain well, he tells you upfront — before you spend an hour.

    Receipts

    We benchmarked Chad's verifier loop against bare models on the standard public coding benchmarks. Reproducible, runtime-graded.

    HumanEval+ (qwen3:14b, 2026-05-08):

    Configuration Pass rate
    bare qwen3:14b 78.66% (129/164)
    Chad w/ qwen3:14b 92.68% (152/164)

    Δ = +14.02 percentage points · 23 problems rescued · 0 regressions · 12 honest halts.

    A free 14B local model wrapped in Chad's verifier loop crosses into frontier-API territory on HumanEval+. No API quota, no per-call cost, no code leaving the machine.

    Full reproducible benchmarks: usechad/chad-bench Site: usechad.dev


    How the verifier loop works (the concept, in pseudocode)

    for attempt in range(MAX_ATTEMPTS):
        code = model.generate(prompt)
        passed, error = run_tests(code)
        if passed:
            return code
        if same_error_class_three_times_in_a_row(error):
            halt_honestly(error)             # stops; tells you it's stuck
            return None
        prompt = with_diagnosis(prompt, error)   # feed failure back, retry
    

    That's the whole idea. The engineering is in the test battery, the diagnosis prompt geometry, the deadlock detector, and the per-model calibration. The loop itself is small. The full implementation lives in dante/build/runner.py and dante/verifiers/.


    What Chad isn't

    Not a model. Chad is a harness. It uses the model you already have.

    Not a cloud service. Chad runs locally. Your code, your prompts, your model weights — none of it crosses the wire.

    Not finished. This is v0.4. Things will change. The verifier loop, discovery layer, and planner are stable. The self-repair branch logic is still maturing. The HTTP API surface is functional but not yet feature-complete. Read CHANGELOG.md before assuming a feature is final.

    Not magic. Chad helps the model finish what it starts on tasks where running tests gives useful feedback. On tasks where there's no good "test" — fabricating citations, writing prose, recall benchmarks — Chad adds little. Use the right tool for the job.


    Honest limits

    Per Chad's own self-assessment:

    • The verifier loop's leverage depends entirely on the test battery's coverage. Edge cases not in the test set don't get caught.
    • The diagnostician model occasionally repeats the same hint when it should escalate to a different strategy. The deadlock detector mitigates but doesn't fully solve this.
    • The planner can decompose a task in ways that hide ambiguity (grouping multiple steps into one). When that happens, the verifier can't separate which sub-step actually failed.
    • Discovery layer scope is bounded by what the discovery prompt asks. Hidden assumptions outside its scope still slip through.

    These are real. We're working on each of them. We're also publishing about them rather than papering over them.

    See ARCHITECTURE.md for the deeper design notes.


    Hardware

    Chad has been developed and benchmarked on consumer hardware. The HumanEval+ run above was produced on:

    GPU:  NVIDIA RTX 5090 Laptop (24 GB VRAM)
    RAM:  64 GB DDR5
    OS:   Pop!_OS 24.04 LTS
    

    Smaller GPUs work too. The constraint is whether the model you want to load fits in VRAM. qwen3:14b in 4-bit quant fits comfortably in 12 GB.


    License

    Apache 2.0. See LICENSE.

    You can use Chad commercially, modify it, ship it inside your product, and keep your own changes private. Keep the copyright notice and license text intact when redistributing.

    Conscious tech

    1 Reply Last reply
    1

    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

    With your input, this post could be even better 💗

    Register Login
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • Bookmarks
    • Search