agentgateway on Cloudron: a community packages for an open-source gateway for MCP (Model Context Protocol) servers and LLM (Large Language Model) backends,

LoudLemur

What this is

It is the 1.3.1 release candidate with the new GUI.

It is hopefully a solution to agent connectivity.

Thank you to everybody here on Cloudron. You are an inspiration. Thank you also to those who contributed their time and amazing skills to creating agentgateway. We hope an official package is eventually supported.

TL/DR

There are three sections below that might help:

Potential Users
Cloudron packagers/maintainers
agentgateway developers

agentgateway is most valuable as the single AI control point for your whole self-hosted stack. Instead of each app talking directly to Ollama or to a cloud provider, they all talk to the gateway, and the gateway handles auth, backend choice, cost and token visibility, failover, and guardrails in one place. The payoff is that you can swap what is behind it (eg CPU Ollama today, a quantum computer GPU box later, a cloud model when a task needs it) without reconfiguring a single client. That is the reason it exists.

An example of using and verifying agentgateway

Lets say you already have OpenWebUI running. Lowest friction: make the gateway the model provider for OpenWebUI. Connect OpenWebUI to the data plane. Have a real conversation in OpenWebUI that is served through the gateway, and watch the gateway register the traffic. This proves the gateway is in the path doing real work, and from that point OpenWebUI never needs to know or care which backend answers. This is the "it is doing something real" baseline.

agentgateway is an open-source gateway for MCP servers and LLM backends, with an embedded admin UI. This post announces a community Cloudron package for it. The package runs agentgateway on Cloudron with two surfaces: an admin UI protected by Cloudron single sign-on, and an open, key-protected data plane for programmatic clients. It is pinned to a specific stable upstream version, builds from the official upstream binary on top of the Cloudron base image, and keeps its only state in a YAML configuration file under the persistent data directory.

How well it works

It installs and runs cleanly on the current Cloudron base. Both the MCP path and the LLM path were validated end to end on a live box. The data plane serves the MCP endpoint, the server-sent-events endpoint, and an OpenAI-compatible completions endpoint, all behind the application's own API-key authentication. The admin UI sits behind Cloudron single sign-on. Configuration persists across restarts and across a version update.

Two caveats: First, a fresh install shows a warning that the LLM configuration is not initialized; this is expected and harmless, and the reason is explained below. Second, on a processor-only inference backend, the first call to a cold model can be slow enough to hit the reverse-proxy read window; the package documents the mitigations.

The design, in one paragraph

The guiding rule is that the OAuth proxy must never sit in front of programmatic clients, because it would redirect their requests to a login page and break them. So the admin UI lives on the primary domain behind the proxy-auth addon, and the data plane lives on its own subdomain through a plural httpPorts entry, with no proxy-auth, secured by agentgateway's own API key. A request to the data plane without a key is rejected by the application, not redirected to single sign-on. A request to the admin domain without a session is redirected to single sign-on. Both surfaces receive platform TLS.

Working with other Cloudron apps

We deliberately packaged the application with other Cloudron supported applications in mind. We packaged it to coexist cleanly with other Cloudron apps and to be consumed by them, because a gateway is only useful if the things around it can reach it without friction. A few specific choices serve that goal.

The data plane is a standard surface. It speaks the MCP protocol and an OpenAI-compatible completions API on its own subdomain, secured by the application's own key rather than by the OAuth proxy. That single decision is what lets any other Cloudron app consume it. A chat UI, an automation tool, a notebook, or anything that already speaks those protocols can point at the data plane and authenticate with a bearer token, and the request is never redirected to a login page. Putting the OAuth proxy in front of that surface would have broken every one of those callers, which is exactly the trap we avoided.

We proved this with a sibling app, not in theory. A separate chat UI running on the same Cloudron box, itself behind single sign-on, was pointed at the data plane and served completions through it. That is the shape we expect most people to use: a human-facing app behind single sign-on, calling out to the key-protected gateway, which in turn fans out to MCP servers and model backends.

The example configurations target sibling Cloudron services, not only cloud providers. The included examples point at an Ollama app for local models, at S3-compatible storage, and at a vector database, so the package slots into a self-hosted stack rather than assuming an external account. You can wire the gateway to services you already run on the same box.

The two-surface split is the reusable idea. One human surface behind single sign-on and one programmatic surface behind the application's own authentication, on separate subdomains, is what lets a gateway sit safely between Cloudron apps. We think the pattern generalizes to other AI and agent tooling, and we have written it up for the Cloudron team below in the hope it becomes a documented recipe.

Section 1: For potential users

A few tips to make the first hour smooth.

There are two addresses. The admin UI is at your primary domain and you log in through Cloudron single sign-on. The data plane is at the gw-api subdomain, and your clients use the paths under it, for example the MCP endpoint and the completions endpoint at https://gw-api.your-domain/mcp and https://gw-api.your-domain/v1/chat/completions.

Find your data-plane API key before you wire up a client. It is generated on first run and stored in your configuration. You can read it from the admin UI under the data-plane listener's authentication policy, or, as a reliable fallback, retrieve it with a container exec that reads it from the configuration file in the data directory. Clients send it as a bearer token in the authorization header.

The warning that the LLM configuration is not initialized is expected on a fresh install and does not mean anything is broken. The completions endpoint on the data plane is configured separately from the admin model dashboard. The gateway is healthy and MCP works regardless of that message. To add an LLM backend, start from the included Ollama example.

Treat the configuration file as managed by the application. The admin UI rewrites it when you save settings, and it strips comments in the process, so the file in the persistent data directory is the source of truth and your comments will not survive a UI save. Cloudron backs the file up with the app.

When you add MCP backends, note that stdio servers run as local subprocesses inside the container through the Node or Python runners that the image provides. The first call to a server that has not been fetched yet is slow while the runner downloads it. Later calls are fast.

When you add an LLM backend, you can point the completions route at any OpenAI-compatible provider. On a processor-only backend the first call can be slow, because the model has to load. Warm the model, set a keep-alive on the backend, and prefer streaming, because the reverse proxy closes a long request that sends no bytes.

The top-level configuration section needs a restart to take effect; most other changes hot-reload.

To connect a separate chat UI or any OpenAI-compatible client, set the base URL to your data plane and the key to your data-plane API key. A human-facing UI behind single sign-on calling out to the key-protected data plane is exactly the intended shape.

Section 2: For the Cloudron team

We think this is a strong candidate for official packaging. An MCP and LLM gateway is broadly useful, and the application is small to package: a single binary with an embedded UI, a handful of runtime libraries, and a single YAML file as state.

A documentation fix that would save others time: the packaging addon reference shows the proxy-auth addon key in lowercase, while the platform requires it in camelCase. This cost us a failed validation before any build.

The two-surface pattern worked well and is worth documenting as a recommended recipe. One human surface behind single sign-on and one programmatic surface behind the application's own authentication, placed on separate subdomains through the plural httpPorts entry, is increasingly common for AI and agent tooling. A documented blueprint would help packagers avoid the trap of putting the OAuth proxy in front of a data plane, which is the most common way to break programmatic clients.

The constraint that proxy-auth cannot be added after first install is reasonable, but it forces a full reinstall if it is missed. It deserves prominent placement in the packaging documentation.

A per-app reverse-proxy read-timeout setting would help a whole class of applications. LLM inference and other slow-first-response workloads can exceed the current read window on a cold call, and there is no per-app knob today. We worked around it with keep-alive and streaming, but a manifest field would be the clean answer.

GPU support is, in our view, the single biggest enabler for AI applications on Cloudron. The community has shown GPU passthrough working at the host level, but each application still needs the GPU flag injected manually, which is not a manifest field. A manifest-level GPU opt-in that injects the flag when the host supports it and falls back to processor execution gracefully would unblock local inference, image generation, transcription, and more in one step.

TLS passthrough, which is already an open request, is relevant for AI applications that want end-to-end TLS on a data plane. For our case, terminating TLS at the platform on the httpPorts subdomain was the right choice and gave us TLS at no cost.

### Section 3: For the agentgateway developers

Thank you for shipping a single binary with the UI embedded. It is excellent for packaging. A few things would make packaging easier still, and would help on other platforms too.

The admin bind address defaults to localhost and lives only in the configuration file, which the UI rewrites and can drop. For containerized, reverse-proxied deployments this means we have to re-assert the bind address on every boot. An environment-variable override that the binary reads at startup, and that takes precedence over the file, would remove this entirely.

The UI rewriting the configuration file and stripping comments is hostile to file-as-source-of-truth deployments. A layered configuration, with a read-only base plus UI-managed overrides, or comment preservation, or an environment-and-flag layer the UI cannot overwrite, would all help.

The readiness endpoint is on a separate listener from the main port. Reverse proxies map a single port, so we used the UI path as the health check. Exposing readiness on the main listener, or making its port configurable to coincide with the main one, would simplify health checks for every reverse-proxied deployment.

The relationship between a route that serves completions and the top-level model registry that the dashboard reads is confusing. The dashboard, the cost view, and virtual models read the top-level registry and the model catalog, while completions can be served by a route backend with that registry empty. The result is a warning that the LLM configuration is not initialized even while the completions endpoint returns valid responses. Unifying these, or having the UI recognize a route-served model as active, would remove a real source of user confusion.

The virtual-models syntax changed, and the older form is now rejected. A short migration note in the release notes would help packagers ship correct examples.

The binary links a specific glibc version exactly. A future toolchain bump could break older base images silently. Documenting a minimum glibc per release, or offering a statically linked or musl build, would let packagers pin base images with confidence. We made a linker and version check a mandatory step on every version bump as a result.

Finally, the migration subcommand is useful for configuration upgrades. Keeping it idempotent on an already-current configuration lets packagers run it safely at boot, which is what we do.

Closing

The package is open source and installs through a community versions URL.

Repository: https://github.com/OrcVole/agentgateway-cloudron
Image: ghcr.io/orcvole/agentgateway-cloudron (public, pinned by digest)
Install: cloudron install --versions-url https://raw.githubusercontent.com/OrcVole/agentgateway-cloudron/main/CloudronVersions.json

The repository includes example configurations for MCP and LLM backends, several of which point at sibling Cloudron services, a debugging log of the failures we actually hit, an upgrade guide with the release gates we recommend running on every version bump, and a retrospective of the packaging work. It is a community package and is not affiliated with the upstream agentgateway project. Feedback and improvements are welcome.

LoudLemur

This post is deleted!

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum