TEI (Text Embeddings Inference) on Cloudron - Community Package

LoudLemur

Text Embeddings Inference (Hugging Face TEI), packaged for Cloudron

A community package for Text Embeddings Inference (TEI) — Hugging Face's fast Rust server for text
embeddings and reranking. It is the embedding tier of a self-hostable, OpenAI-compatible retrieval
stack: TEI turns text into vectors, a vector database such as Qdrant stores and searches them, and an
LLM answers over what is retrieved.

Package: https://github.com/OrcVole/TEI-Cloudron
Upstream: https://github.com/huggingface/text-embeddings-inference

TL;DR

TEI gives you a private, self-hosted embeddings API (OpenAI-compatible /v1/embeddings, plus a
native /embed). Install it from the versions URL, the API key is generated for you on first run, point any app that
speaks the OpenAI embeddings API at it, and you have on-box embeddings for retrieval-augmented
generation (RAG) and semantic search — no documents leaving your server. It pairs directly with the
Qdrant Cloudron package to make a complete pure-Rust retrieval pipeline. It is API-only (no web
UI), amd64-only (the CPU build bundles Intel MKL), and verified working on Cloudron 9.x,
including a live TEI-to-Qdrant round trip. Install:

cloudron install \
  --versions-url https://raw.githubusercontent.com/OrcVole/TEI-Cloudron/main/CloudronVersions.json \
  --location tei.example.com

For potential users

What it is. You send TEI text, it returns an embedding (a vector of ~384 numbers for the default
model). That vector is the "meaning" of your text as coordinates, which a vector database can compare
and search. TEI speaks the OpenAI embeddings API, so anything that already calls OpenAI for
embeddings can call your own box instead.

There is no web page. TEI is a service, not a website — opening its domain shows a blank page, and
that is normal. The only browsable page is the interactive Swagger API docs at /docs (behind your
Cloudron login; the app's "Open" button goes there). You check it works by calling it: GET /health
returns OK, and a real embedding request returns a JSON list of ~384 decimal numbers. That wall of
numbers is the correct output — it is for machines, not people.

What you can do with it.

Give OpenWebUI, AnythingLLM, n8n, or rig a private embeddings backend for document chat and
semantic search, instead of sending your files to a third-party API.
Build a RAG pipeline: embed your documents with TEI, store and search them in Qdrant, and let an
LLM (e.g. Ollama) answer questions grounded in your own data.
Switch the model with one environment variable (TEI_MODEL_ID) — multilingual, larger/higher
recall, or a cross-encoder reranker (for the /rerank endpoint) to sharpen search results.

Synergy with other Cloudron apps. TEI is one tier of a stack you can run entirely on Cloudron:

App	Role
TEI (this)	Text -> embeddings (and reranking)
Qdrant	Stores the vectors, does similarity search
Ollama	The LLM that answers
OpenWebUI	The chat UI; point its embedding engine at TEI for document RAG
agentgateway	One gated front door: routes chat to the LLM and exposes Qdrant as an agent (MCP) tool
n8n	Automates ingestion: pull documents, embed via TEI, upsert to Qdrant

The default model is BAAI/bge-small-en-v1.5 (384-dim), which matches the Qdrant package's examples,
so the pieces line up dimensionally out of the box.

Good defaults. TEI has no auth of its own, so the package generates a strong API key on first run
and injects it through the environment; the embedding endpoints require it, /health stays open for
monitoring, and the Swagger docs sit behind Cloudron single sign-on. The model cache and the key live
under /app/data, so Cloudron's backup covers them and a restore brings the same key back.

Caveats: no web UI; amd64 only; a large model needs more than the 2 GB default and a slower first
boot (it downloads once, then is cached); the default model does embeddings only (/rerank needs a
reranker model).

For other packagers

What helped.

A multi-stage Dockerfile onto cloudron/base:5.0.0: pull the runtime out of the official
upstream image in stage one, assemble onto the base in stage two. On-server build (cloudron install from source) meant no local Docker was needed for iterating; rootless podman built it
locally.
Putting all state under /app/data (the model cache and the generated key) made the
localstorage backup just work — a real backup then clone brought the key back byte-for-byte.
A build-time linkage gate (ldd + --version in the Dockerfile) catches a missing library or
glibc symbol at build time instead of at runtime.
The reference packages (agentgateway, and the sibling Qdrant package) set the patterns: the
two-surface auth model, the /run vs /app/data discipline, the release gates.

What was difficult.

The CPU build is not a lone binary — it carries a ~320 MB Intel MKL math runtime. The real trap:
libiomp5.so in the upstream image is a symlink whose internal soname is libomp.so.5, so
ldconfig indexes it under the wrong name and the binary fails to resolve it. Fix: cp -L it to a
concrete libiomp5.so and resolve the MKL libs via LD_LIBRARY_PATH (filename match), not
ldconfig. And the linkage gate does not load the MKL libs (they are dlopened at inference time),
so the only honest test is a runtime /embed smoke on the assembled image.
The router defaults --hostname to the HOSTNAME env var, which Docker sets to the container
id, so it binds the wrong interface unless you pass --hostname 0.0.0.0 explicitly. The upstream
default port is 80, which the unprivileged cloudron user cannot bind — move it to 8080.
proxyAuth with supportsBearerAuth: true let any bearer header skip the SSO wall on /docs;
for a docs-only wall, drop the flag ("proxyAuth": { "path": "/docs" }).
Users expect a GUI and there is none, so set configurePath: /docs and say so plainly in the
post-install notes, including that a successful embedding is a wall of numbers.
Publishing on a podman-only host: cloudron versions add could not see the image through the
podman socket, so I hand-built CloudronVersions.json (inlining the file:// description, changelog,
and post-install message; keeping icon as file://; pinning the digest). It validated on the real
versions-url install.

How we made it compatible with other Cloudron apps.

OpenAI-compatible endpoint is the key: it lets OpenWebUI, n8n, AnythingLLM, and rig use TEI by
changing one base URL, with no app-specific glue.
API-key auth on the data plane, SSO only on /docs, so a sibling app authenticates with a key
and gets TEI's own 401 (not a login redirect) when the key is wrong — that is what makes
app-to-app calls work.
Dimensional alignment with Qdrant (both default to bge-small-en-v1.5, 384-dim) so vectors are
comparable across the stack.
We wired it live: OpenWebUI's embedding engine pointed at TEI via the package's env.sh, and a
verified TEI-to-Qdrant retrieval round trip. agentgateway already fronts Qdrant (as an MCP tool) and
Ollama, so TEI slots in as the embeddings tier.

For the Cloudron maintainers

Friendly asks from packaging this:

iconUrl couples to the minBoxVersion floor. A --versions-url manifest must include
iconUrl, and iconUrl forces minBoxVersion >= 9.1.0; omitting it fails validation. So every
community-channel app is pinned to box 9.1.0+ purely by the icon requirement, even when it runs on
8.3. Only the real versions-url path surfaces this (on-server cloudron install accepts a lower
floor). Please document or decouple. (This affects the Qdrant package too.)
cloudron versions add needs the image via the Docker API, with no working podman bridge. On a
podman-only host it reported "No docker image found, run cloudron build first" even with
DOCKER_HOST pointed at the podman socket and the image present. A documented podman path, or a
cloudron versions add --image <ref> that takes a registry reference directly, would remove the need
to hand-build the versions file.
The manifestVersion is still 2 and these primitives look stable, which is reassuring for the
Cloudron 10 upgrade — the one thing a packager will need to re-verify on 10 is the cloudron/base
pin (re-run the linkage gate).

For the developers of the program (Hugging Face / TEI)

Friendly asks that would make TEI easier to run in any container:

--hostname defaulting to the HOSTNAME env var is container-hostile (Docker sets HOSTNAME to
the container id, so the server binds the wrong interface). Consider defaulting to 0.0.0.0.
A CPU image that does not bundle Intel MKL (or a clearly separate slim tag) would make a small,
reproducible base copy much simpler. The MKL + libfakeintel + libiomp5 set, and the
libiomp5.so soname mismatch, were the bulk of the packaging effort.
Publish a minimum glibc per release — the binary is dynamically linked, and a slim base needs to
know the floor (a toolchain bump that raises it fails at runtime, not build time).
An arm64 CPU image would let TEI run on arm Cloudron hosts; today the CPU/MKL image is amd64-only.

It tracks upstream TEI releases and keeps the binary unmodified. Hugging Face and the model names are
trademarks of their owners; this is a community package, not affiliated with or endorsed by Hugging
Face. Feedback, issues, and suggestions are very welcome.

#text-embeddings-inference #huggingface #embeddings #rag #cloudron-9.1

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum