Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps - Status | Demo | Docs | Install
  1. Cloudron Forum
  2. Community Apps
  3. TEI (Text Embeddings Inference) on Cloudron - Community Package

TEI (Text Embeddings Inference) on Cloudron - Community Package

Scheduled Pinned Locked Moved Community Apps
teitext embeddinginferenceaiqdrant
1 Posts 1 Posters 53 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Online
    L Online
    LoudLemur
    wrote last edited by
    #1

    Text Embeddings Inference (Hugging Face TEI), packaged for Cloudron

    A community package for Text Embeddings Inference (TEI) — Hugging Face's fast Rust server for text
    embeddings and reranking. It is the embedding tier of a self-hostable, OpenAI-compatible retrieval
    stack: TEI turns text into vectors, a vector database such as Qdrant stores and searches them, and an
    LLM answers over what is retrieved.

    • Package: https://github.com/OrcVole/TEI-Cloudron
    • Upstream: https://github.com/huggingface/text-embeddings-inference

    TL;DR

    TEI gives you a private, self-hosted embeddings API (OpenAI-compatible /v1/embeddings, plus a
    native /embed). Install it from the versions URL, the API key is generated for you on first run, point any app that
    speaks the OpenAI embeddings API at it, and you have on-box embeddings for retrieval-augmented
    generation (RAG) and semantic search — no documents leaving your server. It pairs directly with the
    Qdrant Cloudron package to make a complete pure-Rust retrieval pipeline. It is API-only (no web
    UI)
    , amd64-only (the CPU build bundles Intel MKL), and verified working on Cloudron 9.x,
    including a live TEI-to-Qdrant round trip. Install:

    cloudron install \
      --versions-url https://raw.githubusercontent.com/OrcVole/TEI-Cloudron/main/CloudronVersions.json \
      --location tei.example.com
    

    For potential users

    What it is. You send TEI text, it returns an embedding (a vector of ~384 numbers for the default
    model). That vector is the "meaning" of your text as coordinates, which a vector database can compare
    and search. TEI speaks the OpenAI embeddings API, so anything that already calls OpenAI for
    embeddings can call your own box instead.

    There is no web page. TEI is a service, not a website — opening its domain shows a blank page, and
    that is normal. The only browsable page is the interactive Swagger API docs at /docs (behind your
    Cloudron login; the app's "Open" button goes there). You check it works by calling it: GET /health
    returns OK, and a real embedding request returns a JSON list of ~384 decimal numbers. That wall of
    numbers is the correct output — it is for machines, not people.

    What you can do with it.

    • Give OpenWebUI, AnythingLLM, n8n, or rig a private embeddings backend for document chat and
      semantic search, instead of sending your files to a third-party API.
    • Build a RAG pipeline: embed your documents with TEI, store and search them in Qdrant, and let an
      LLM (e.g. Ollama) answer questions grounded in your own data.
    • Switch the model with one environment variable (TEI_MODEL_ID) — multilingual, larger/higher
      recall, or a cross-encoder reranker (for the /rerank endpoint) to sharpen search results.

    Synergy with other Cloudron apps. TEI is one tier of a stack you can run entirely on Cloudron:

    App Role
    TEI (this) Text -> embeddings (and reranking)
    Qdrant Stores the vectors, does similarity search
    Ollama The LLM that answers
    OpenWebUI The chat UI; point its embedding engine at TEI for document RAG
    agentgateway One gated front door: routes chat to the LLM and exposes Qdrant as an agent (MCP) tool
    n8n Automates ingestion: pull documents, embed via TEI, upsert to Qdrant

    The default model is BAAI/bge-small-en-v1.5 (384-dim), which matches the Qdrant package's examples,
    so the pieces line up dimensionally out of the box.

    Good defaults. TEI has no auth of its own, so the package generates a strong API key on first run
    and injects it through the environment; the embedding endpoints require it, /health stays open for
    monitoring, and the Swagger docs sit behind Cloudron single sign-on. The model cache and the key live
    under /app/data, so Cloudron's backup covers them and a restore brings the same key back.

    Caveats: no web UI; amd64 only; a large model needs more than the 2 GB default and a slower first
    boot (it downloads once, then is cached); the default model does embeddings only (/rerank needs a
    reranker model).


    For other packagers

    What helped.

    • A multi-stage Dockerfile onto cloudron/base:5.0.0: pull the runtime out of the official
      upstream image in stage one, assemble onto the base in stage two. On-server build (cloudron install from source) meant no local Docker was needed for iterating; rootless podman built it
      locally.
    • Putting all state under /app/data (the model cache and the generated key) made the
      localstorage backup just work — a real backup then clone brought the key back byte-for-byte.
    • A build-time linkage gate (ldd + --version in the Dockerfile) catches a missing library or
      glibc symbol at build time instead of at runtime.
    • The reference packages (agentgateway, and the sibling Qdrant package) set the patterns: the
      two-surface auth model, the /run vs /app/data discipline, the release gates.

    What was difficult.

    • The CPU build is not a lone binary — it carries a ~320 MB Intel MKL math runtime. The real trap:
      libiomp5.so in the upstream image is a symlink whose internal soname is libomp.so.5, so
      ldconfig indexes it under the wrong name and the binary fails to resolve it. Fix: cp -L it to a
      concrete libiomp5.so and resolve the MKL libs via LD_LIBRARY_PATH (filename match), not
      ldconfig. And the linkage gate does not load the MKL libs (they are dlopened at inference time),
      so the only honest test is a runtime /embed smoke on the assembled image.
    • The router defaults --hostname to the HOSTNAME env var, which Docker sets to the container
      id
      , so it binds the wrong interface unless you pass --hostname 0.0.0.0 explicitly. The upstream
      default port is 80, which the unprivileged cloudron user cannot bind — move it to 8080.
    • proxyAuth with supportsBearerAuth: true let any bearer header skip the SSO wall on /docs;
      for a docs-only wall, drop the flag ("proxyAuth": { "path": "/docs" }).
    • Users expect a GUI and there is none, so set configurePath: /docs and say so plainly in the
      post-install notes, including that a successful embedding is a wall of numbers.
    • Publishing on a podman-only host: cloudron versions add could not see the image through the
      podman socket, so I hand-built CloudronVersions.json (inlining the file:// description, changelog,
      and post-install message; keeping icon as file://; pinning the digest). It validated on the real
      versions-url install.

    How we made it compatible with other Cloudron apps.

    • OpenAI-compatible endpoint is the key: it lets OpenWebUI, n8n, AnythingLLM, and rig use TEI by
      changing one base URL, with no app-specific glue.
    • API-key auth on the data plane, SSO only on /docs, so a sibling app authenticates with a key
      and gets TEI's own 401 (not a login redirect) when the key is wrong — that is what makes
      app-to-app calls work.
    • Dimensional alignment with Qdrant (both default to bge-small-en-v1.5, 384-dim) so vectors are
      comparable across the stack.
    • We wired it live: OpenWebUI's embedding engine pointed at TEI via the package's env.sh, and a
      verified TEI-to-Qdrant retrieval round trip. agentgateway already fronts Qdrant (as an MCP tool) and
      Ollama, so TEI slots in as the embeddings tier.

    For the Cloudron maintainers

    Friendly asks from packaging this:

    • iconUrl couples to the minBoxVersion floor. A --versions-url manifest must include
      iconUrl, and iconUrl forces minBoxVersion >= 9.1.0; omitting it fails validation. So every
      community-channel app is pinned to box 9.1.0+ purely by the icon requirement, even when it runs on
      8.3. Only the real versions-url path surfaces this (on-server cloudron install accepts a lower
      floor). Please document or decouple. (This affects the Qdrant package too.)
    • cloudron versions add needs the image via the Docker API, with no working podman bridge. On a
      podman-only host it reported "No docker image found, run cloudron build first" even with
      DOCKER_HOST pointed at the podman socket and the image present. A documented podman path, or a
      cloudron versions add --image <ref> that takes a registry reference directly, would remove the need
      to hand-build the versions file.
    • The manifestVersion is still 2 and these primitives look stable, which is reassuring for the
      Cloudron 10 upgrade — the one thing a packager will need to re-verify on 10 is the cloudron/base
      pin (re-run the linkage gate).

    For the developers of the program (Hugging Face / TEI)

    Friendly asks that would make TEI easier to run in any container:

    • --hostname defaulting to the HOSTNAME env var is container-hostile (Docker sets HOSTNAME to
      the container id, so the server binds the wrong interface). Consider defaulting to 0.0.0.0.
    • A CPU image that does not bundle Intel MKL (or a clearly separate slim tag) would make a small,
      reproducible base copy much simpler. The MKL + libfakeintel + libiomp5 set, and the
      libiomp5.so soname mismatch, were the bulk of the packaging effort.
    • Publish a minimum glibc per release — the binary is dynamically linked, and a slim base needs to
      know the floor (a toolchain bump that raises it fails at runtime, not build time).
    • An arm64 CPU image would let TEI run on arm Cloudron hosts; today the CPU/MKL image is amd64-only.

    It tracks upstream TEI releases and keeps the binary unmodified. Hugging Face and the model names are
    trademarks of their owners; this is a community package, not affiliated with or endorsed by Hugging
    Face. Feedback, issues, and suggestions are very welcome.

    #text-embeddings-inference #huggingface #embeddings #rag #cloudron-9.1

    1 Reply Last reply
    1

    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

    With your input, this post could be even better 💗

    Register Login
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • Bookmarks
    • Search