Ok some further updates...
TL;DR - We still need help getting Cloudron's OpenWebUI container to start with GPU enabled. This is our bottlneck. Otherwise everything else works.
Now that we've figured out how to make our Dell servers and VMWare hypervisor* reliably support GPU all the way through to virtual machines, but failing to get Cloudron apps working with GPU, we have been looking for low cost 'hosting ready' virtual server alternatives for OpenWebUI. (Windows Server hosting that we like for a lot of other workloads is not a preferred option in this case.) At the VM level we've now got an Ubuntu/Caddy/Webmin (or Cockpit)/Docker/NVIDIA+CUDA stack fully operational with OpenWebUI/Ollama, and it's arguably a commercially viable solution.
And I must say, now that it's running in a 'hosting ready' environment with a software stack that's very similar to what Cloudron offers, even with our older-generation GPU test platform (Tesla P40s), the speed results from tests in OpenWebUI are extremely pleasing. I don't have tokens-per-second stats yet, but I can report one query that took 3.75 minutes using CPU only on the same host hardware, took 13 seconds with a single Tesla P40 GPU behind it, and left room on the GPU's VRAM for other concurrent queries.
But our stack still doesn't do all the nice stuff that Cloudron does without a lot of extra work - mail, backups, user directory, easy multi-tenanting, app leve resource limiting, automatic DNS, automatic updates, super easy installation (our current virtual server installation guide is still ~70 active configuration steps which can't be fully automated), and more.
Finally realising that we could run vanilla Docker test on the Cloudron host without breaking Cloudron (duh!), we ran the Nvidia sample workload from our Cloudron Ubuntu host. It works. So we know our server is ready.
After initially avoiding running standalone Docker containers on our Cloudron Ubuntu host (because we didn't want to upset Cloudron), running the sample app made us realise we could run a test of OpenWebUI using vanilla Docker to test our system too... It also works.
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
time=2024-11-17T11:40:46.328Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu]"
time=2024-11-17T11:40:46.328Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-11-17T11:40:47.791Z level=INFO source=types.go:123 msg="inference compute" id=GPU-2f9f15c7-39ba-5118-38fa-07ec8a1fa088 library=cuda variant=v12 compute=6.1 driver=12.7 name="Tesla P40" total="23.9 GiB" available="23.7 GiB"
INFO : Started server process [1]
INFO : Waiting for application startup.
INFO : Application startup complete.
So I'm now quite certain our only hurdle is figuring out how to make Cloudron's OpenWebUI start with GPU support. But for the life of me, as Cloudron+Docker learners, we can't figure it out, even for a non-persistent test run. Modifying run.sh
didn't help, and even while running in recovery mode started from the Cloudron CLI we can't see any way to make it work with modifications to run-compose.sh
or /app/pkg/start.sh
or the Dockerfile
or anything else.
What can we do?
Please, can I repeat my offer to provide support (if needed) to get this done? At the very least , we could offer some $$ (and I hope the community might pitch in per my original post if more was needed), testing, notes from our own installation/test challenges, and an Nvidia GPU enabled virtual dev/test machine if required.
*Side note: I think VMWare used to offer a great hypervisor even for small scale, but now it's terrible for smaller customers (in my opinion). There are real alternatives these days, but hardly any that offer point-and-click container management. So virtual server layer container management tools that are as nice as Cloudron still have relevance beyond single server and home lab use cases, we think. Have I mentioned that we love Cloudron?