Cloudron Forum

robw

Hiya!

This is a question for the Cloudron team directly, and I'm happy to get a direct response if the forum isn't the best place for it, but I'll post it publicly in case anyone else may wish to contribute.

In this thread @girish mentioned that GPU support in Cloudron will take 'some time' - https://forum.cloudron.io/topic/11312/the-models-load-in-the-void/6.

I understand completely that such things are difficult to quantify, and I don't know your current development roadmap. For instance, I doubt GPU support is very useful for most of the apps on Cloudron, and I'm sure that supporting all the apps rather than just one is your core business.

My questions are:

Approximately how long is 'some time'? (It's okay if you can't answer that accurately or at all, I understand.)
Is there anything we can do to contribute to speeding it up? This has real commercial value to us, and if $$ can help then that is one thing we may be able to contribute if we can understand how many $$ might help.
For the community - Would it be useful enough for anyone else to help crowd-fund?

Bringing OpenWebUI to Cloudron - and perhaps soon other AI tools - is very exciting for us and multiple of our clients. AI tools are not easy to host privately in a cost effective, secure and commercial way. (I mean not just the pseudo-privacy of a 'private' ChatGPT space where we send our most sensitive data outside our firewall and even worse outside our own international jurisdiction, and then rely on an international corporate's legal T&Cs, or otherwise invest in AWS or Azure infrastructure for at least twice the price of doing it ourselves.)

Cloudron gives us solutions and approaches to many of the moving parts needed to provide truly private AI access to different clients and business units that we have to handle individually otherwise - e.g. easy deployment, built in no-stress backups, simple multi-tenanting, point-and-click resource management, flexible DNS, mail, etc. (It would be even nicer if we got end-to-end encrypted CI/CD pipelines across multiple networks and other related fancy stuff out of the box, but that is a dream only imagined because Cloudron is already great. ) But I don't need to sell Cloudron's benefits to this group, I'm sure, just noting they're great in relation to OpenWebUI. We are big fans of Cloudron, by the way!

So... Since GPU support is a very real dependency for a properly usable OpenWebUI, we'd like to help everyone get it ASAP.

robw

Thinking further on this... Perhaps it's not genuine Cloudron GPU support that we need.

Reading my original question about GPU support in a Cloudron context, I suppose it's easy to assume that there's an expectation of containerized resource management just like we get in all our Cloudron apps - the ability to segregate and limit GPU and attached VRAM just like we can for CPU, RAM, etc. While that would certainly be wonderful, it's not actually what we need for our business case. We just need the ability for OpenWebUI to draw on the hardware GPU resources we attach to its server (which is a virtual machine in our case), and run separate OpenWebUI apps easily with separate logins and datastores (which you've already given us).

Which OpenWebUI already appears to do, I suppose. If the container is started with the GPU usage switch, my understanding is that's all that's needed in the basic case. Please correct me if I'm wrong:
E.g. docker run -d -p 3000:8080 --gpus all etc...
https://docs.openwebui.com/getting-started/

So my new questions are these (without having tried anything out yet):

If we installed the NVIDIA Ubuntu GPU drivers in the Cloudron OS and started the OpenWebUI container with the GPU switch, would it just work?
Does installing the GPU drivers interfere with Cloudron's upgrade processes? (In our case, we wouldn't mind having to manage the GPU drivers separately, we don't expect Cloudron to manage non-native additions.)
If we got past those first two questions, could we get a simple switch on the OpenWebUI Cloudron container to enable GPU support rather than having to set up our own custom container?

Cloudron isn't our virtualisation layer, we use VMWare in our case. So we can target our GPU usage to a Cloudron installation, we don't need to go further and manage it at the app level. (Actually making GPUs work in VMWare is a lot harder than it sounds, we found, but that's a separate problem.)

Of course I know nothing of how Cloudron is managing and segregating resources across its docker containers. Perhaps it is wishful thinking. But to be clear, we don't need fancy app level virtual GPU handling, we just want to use our GPU in one (or any) of the running apps. So I'm crossing fingers for good answers to the above questions.

(On a separate but interesting note, we've been hoping to do some Cloudron experimentation on this front. And they're not even hard experiments that we could do on a bare metal box, but we've been trying to do it in our data centre hosting environment on our existing systems as a proper proof of concept... However it's been a helluva job spinning up a pilot project with some older Dell servers we have running in a data centre with VMWare just to the point of being able to run GPUs at all, let alone getting as far as experimenting with Cloudron containers.

We got some Tesla P40s running in some Dell r720xd's for our pilot project. Sourcing the right hardware including cards and power cables was hard enough.
Then we didn't know what would fit where in which servers because none of it is officially supported or clearly documented.
Then we found we had to upgrade our power supplies even though we thought the standard ones worked on paper.
Then the hoops you have to jump through with both server firmware and BIOS/boot settings, and virtual machine BIOS and boot settings, and then VMWare updates and driver installations are CRAZY...
We've finally got as far as making these things operational only to find that our VMWare Essentials licence doesn't provide virtual GPU support. This fact doesn't seem to be clearly written anywhere in the 1,000 documents we've read lately until you know exactly what to look for, and since BroadCom's recent purchase of VMWare the affordable licences we need seem to have disappeared.

So we still haven't booted a VMWare based Cloudron virtual machine in a data centre with an enterprise GPU in it yet... Our theory that low cost, flexibly hosted, easily backed up, privately hosted AI agents and learning data repositories should be available to smaller enterprises who care about privacy/security and cost enough to avoid the public clouds has encountered many challenges so far. However I think we're not far off - though for licensing reasons I think we might need to re-learn everything we know using a different hypervisor before the end - and I'd like to report back about how this all goes running on Cloudron in the very near future.)

robw

@Lanhild said in ETA for GPU support? Could we contribute to help it along?:

A lot of companies that might deploy Cloudron for its ease of life features don't necessarily have a VPS with a GPU.

Also, (might help you to deepen your Cloudron knowledge) Cloudron packages usually are only one component/application.

Moreover, OpenWebUI is "just" a UI that supports connections to Ollama and isn't affiliated with it. Meaning that Ollama isn't a dependency of it at all.

Excellent points @Lanhild - you've convinced me.

And there are benefits on the Ollama side too. I would appreciate the benefit in using Cloudron to keep our Ollama installation automatically up to date on its own, for instance.

In fact, given our remaining inability to modify the existing Cloudron OpenWebUI app to run with our GPUs, for our small clients we are now thinking this way - I.e. using Cloudron just for the OpenWebUI component and letting them connect to our separately hosted Ollama. It's a bit less convenient than we were hoping, but at least we'll still have segregated data and user management for each client in OpenWebUI.

So now, I also want a Cloudron OpenWebUI app that does not come with bundled Ollama, so that I can be sure these customers don't hammer our CPUs and get frustrated by a slow user experiences.

robw

We're attempting to migrate our Rockat.Chat instance from an Ubuntu Snap installation onto our Cloudron instance. Following the instructions at https://cloudron.io/documentation/guides/import-mongodb/ (we used the backupdb command from https://docs.rocket.chat/installation/snaps to get our Mongo DB backup as it uses mongodump under the hood) almost works but we ran into some roadblocks:

The first time we attempted a restore it resulted in an error at the end of the import. Sorry I don't have the exact error as it disappeared from the terminal window before I copied it. It was something similar to "failed: Could not import index bio_1 because it already exists with different options".

Turning the app back on, it spins up correctly and we are able to login. Everything seems to work except that the most recent chat messages are from over a year ago. The chat channels are correct, users are correct, everything else seems to be correct - although we didn't run extensive tests.

Due to the previous error, we tried running the mongorestore command using the "--drop" option. We thought that might get around the previous error. It seemed to work, but now we receive this error instead:

2020-08-07T01:23:03.863+0000 Failed: d1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry: error creating indexes for d1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry: cannot restore index with namespace 'd1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry.$queueOrder_1_estimatedWaitingTimeQueue_1_estimatedServiceTimeAt_1': namespace is too long (max size is 127 bytes)

After turning the app back on, we get a similar result to above. Everything seems to work except that the most recent messages are from over a year ago (but a different date this time).

Doing some research, it appears that Mongo DB 3.6.3 has a hard limit on the index namespace size that we can't get around:

https://docs.mongodb.com/manual/reference/limits/

Perhaps the BSON file size limit is also affecting us here. Our rocketchat_uploads.chunks.bson file in the Mongo backup is > 400 Mb. But we don't know.

So my question is... Is there a way to upgrade the Mongo DB instance in the Cloudron Addon to a newer version (4.2+) that doesn't have some of these limits? Or can you suggest another way to import our data?

It's important to some of our stakeholders that we maintain the chat history in our migration.

robw

@d19dotca @RazielKanos Matomo has heatmaps and session recording but it is a paid extra plugin if you're self hosting (e.g. Cloudron). It's included in the cloud hosted version.
https://plugins.matomo.org/HeatmapSessionRecording

As a long time self hosted Matomo user, I can report that as an open source system, it is an outstanding enterprise platform with barely any other competing options in its class. A true Google Analytics replacement option. In fact you can't even compare it directly to GA if you look at the extended plugins: it's more feature rich. Matomo's APIs are fantastic if you want to customize your tracking capabilities: you can do a lot.

But for advanced self hosted use, it's not for the light hearted. Running anything other than a standard installation with a few small to medium sites (e.g. to guess, up to 1 million monthly visits/actions across the platform) takes real effort. Optimized management, stability, and speed incurs a learning curve and significant time. You need more than basic compute resources. Updates need direct attention. And there is non-trivial cost involved if you want the advanced features. (These aren't Matomo weaknesses, just side effects of running an advanced system that collects huge amounts of data. The effort is worthwhile IMO.)

For simple self hosted out-of-the-box use with medium or small sized sites, it's quick and easy, yet still very feature rich. And you probably won't run into the management overheads I outlined above.

Re: self hosted versus cloud hosted, there are important advantages to self hosting including direct data access, complete data sovereignty (it's all yours, on your server), host in your own region, no user or website or feature limits, white labelling, complete flexibility, host the way you want (even on Windows servers) and manage your own security, and more. Perhaps these advantages might mean more to enterprise customers and agencies than smaller businesses. Oh, then there's the big one: it's free (the base system)!

I admire Matomo for servicing enterprise customers very well while still coming up with a very competitive pricing model for smaller businesses (cloud hosting) and yet still catering to the free / open source world without compromising their core system in any way. They behave exactly like an open source company should!

By the way, there's a premium bundle that includes all the advanced features and is affordable for serious smaller businesses or agencies who self host. We haven't found anything else that matches this advanced level of capability anywhere near this price range. The link isn't very obvious on the website: https://plugins.matomo.org/PremiumBundle

Matomo's inclusion in Cloudron is fantastic. Not only does the combination offer hassle=free management, but simple and free / low cost development/staging instances for testing and integration projects too: something we struggled to achieve before.

From limited previous (possibly outdated) experience, I believe Open Web Analytics is comparable to base Matomo but without the extensibility options. I believe Umami is simpler / more basic by comparison, but easy and quite beautiful, so that might suit some people well.

This is all IMHO based on my own small agency experience of course. YMMV. I don't work for Matomo, just a long time user (since Piwik days).

robw

I might see if I can get an official comment from the support team first... Ad hoc experimentation with live servers that diverges from official documentation isn't really something we can do.

I love Cloudron's close coupling with the public facing DNS and abstraction of it into a point-and-click GUI, because it makes day to day operation very smooth and easy. It just works, which is the beauty of the whole platform. But it does mean we can't simply clone the server to create a fully loaded sandbox for experimentation with real configuration data. Well, not unless we want to either turn the production server off while we're experimenting (not really an option), or configure a separate set of DNS entries and reconfigure the clone, by which time it's already a lot of work and it's arguably custom enough that it's not a reliable experiment for things like this anyway. (By the way, we'd like to set up cloned test environments for each CR server we run, but we're not resourced for that yet.)

We could spin up a new Cloudron on this version and play with that, Cloudron is also great for that because it's so easy and open and free. But that's not really a proper test for this case either. If the support team want that -d option for future compatibility of some kind, all I proved with my test is that I can break my server in the future without knowing about it now by not following the documentation.

So if we need to proceed without a proper test environment, ideally I'd like to do it according to the documentation and formal support advice or at least someone else's experience with logic to back it up.

Thanks for the suggestion though, appreciate it. It just triggered a few thoughts that I typed out.

robw

Per this thread, I think it's possible to add Nvidia GPU support to a Cloudron server without impacting Cloudron (or at least not breaking it).

So I'd like to be able to launch OpenWebUI with GPU support.

I think that adding the --gpus=all switch and the :cuda tag to the image name to the startup options might be all that's required. (Assuming the Ubuntu host has the Nvidia driver and CUDA and the Docker CUDA toolkit properly installed, I think it'll work, and if not, I believe it'll fall back to using CPU.)

Hard coding this would be enough for us, and I don't think it would break for anyone who doesn't have a GPU. The only people it might not suit are people who have a GPU running but don't want to use it. So adding a configuration switch of some kind to turn it on or off would be even nicer.

robw

@LoudLemur said in ETA for GPU support? Could we contribute to help it along?:

@girish said in ETA for GPU support? Could we contribute to help it along?:

I think the hard part is the GPU support in docker is varied.

From Arya:

"As of 2023, GPU support in Docker, particularly for AI applications, has made significant strides but still faces challenges. The main issue is that Docker was originally designed for CPU-based applications and struggled to efficiently utilize GPU resources. ...

Thanks for that info, I wasn't aware of that challenge but it certainly makes sense.

The Open WebUI installation page talks about running the Docker image with GPU support but doesn't mention those problems: https://docs.openwebui.com/getting-started/

robw

I am not a docker or Linux expert (only operational knowledge), but my GPU is running on the Cloudron server (a virtual machine in my case). Happy to share insights outside of this thread if you'd like (although not sure I can help), but I don't believe this is related to the feature request so to avoid confusing everyone, I don't think we should discuss it here.

robw

There is progress to report...

@Lanhild said in ETA for GPU support? Could we contribute to help it along?:

@robw

If we installed the NVIDIA Ubuntu GPU drivers in the Cloudron OS and started the OpenWebUI container with the GPU switch, would it just work?

Not necessarily, it depends on the GPU.

Per above, we have some Nvidia Tesla P40s in our proof of concept environment.

The use case introduced in this thread is based on a desire to make low cost, truly private AI workloads accessible to small and medium business, hopefully using Cloudron as a management tool for OpenWebUI containers in particular, because Cloudron is so easy and nice. (An easy way to run AI RAG and even just vanilla inference even on low parameter LLMs will be invaluable to many businesses.)

In this use case, I don't believe there is a need to support consumer GPUs - I understand that would be an endless tail-chasing exercise.

Thanks to Nvidia's current near-monopoly in the server GPU space, I believe there is only a fairly small number of enterprise grade GPUs that that are likely to be used in a lot of real world scenarios. Though I don't claim to be an expert in this area or have any quotable evidence, my understanding from everything (a lot) that we've read and tested ourselves is that for the Nvidia GPUs, these all run with the same core Nvidia drivers and CUDA toolkit. If other hosting providers or businesses are anything like us, I guess they'll avoid non-standard hardware and software environments and frameworks as much as possible. By which I mean to suggest, officially supporting only a few general/wide/mainstream/standard conditions is likely to be very useful to a significant number of Cloudron users, even if we can't support everyone.

While it may seem like a good idea, results will be very random. Also, nouveau (or whatever they're called now) drivers are the worst available out there. I've only had good results with nvidia official drivers.

Cloudron installs in a 'fresh Ubuntu' server installation, so it appears the nouveau drivers are not installed, so there's no need to install them or worry about them in our case, or a general Nvidia support case I think.

Does installing the GPU drivers interfere with Cloudron's upgrade processes? (In our case, we wouldn't mind having to manage the GPU drivers separately, we don't expect Cloudron to manage non-native additions.)

Yes. Nvidia drivers are a pain to manage and often need debugging.

We are not Linux experts and YMMV of course, but we have got our GPU up and running inside our Cloudron/Ubuntu virtual machine along with CUDA toolkit installed, and ultimately we didn't find this very difficult once we understood what to do. In the end we only ran a few standard installation commands.

There was an apt update command in the middle of the process that I suppose is going to cause some grief for Cloudron. (Since we're not Linux experts we didn't know how to only update the components needed for our drivers and not everything else. But I also note, we did not run apt upgrade.) But otherwise from what we can tell, the Nvidia drivers + CUDA software combination appears to be quite independent of anything connected to Cloudron. (For the moment we've disabled Cloudron automatic updates.)

The much harder part was making everything work at the server hardware (Dell servers) and hypervisor (VMWare) level. I'm happy to say that we now have this working and know how to make it work again. Although we are not able to fully virtualise our GPUs with our current VMWare licence, we don't really want to do this anyway, and we've got the GPU running at the VM level using PCI Passthrough. There were plenty of high hurdles to jump to get there, so if anyone needs any pointers on that front, feel free to reach out (although this part is highly dependent on the hardware and hypervisor combination).

So, we are not quite up to testing OpenWebUI on Cloudron VM with a GPU running yet... Now we need to figure out how to start OpenWebUI inside Cloudron with GPU support.

robw

Ok some further updates...

TL;DR - We still need help getting Cloudron's OpenWebUI container to start with GPU enabled. This is our bottlneck. Otherwise everything else works.

Now that we've figured out how to make our Dell servers and VMWare hypervisor* reliably support GPU all the way through to virtual machines, but failing to get Cloudron apps working with GPU, we have been looking for low cost 'hosting ready' virtual server alternatives for OpenWebUI. (Windows Server hosting that we like for a lot of other workloads is not a preferred option in this case.) At the VM level we've now got an Ubuntu/Caddy/Webmin (or Cockpit)/Docker/NVIDIA+CUDA stack fully operational with OpenWebUI/Ollama, and it's arguably a commercially viable solution.

And I must say, now that it's running in a 'hosting ready' environment with a software stack that's very similar to what Cloudron offers, even with our older-generation GPU test platform (Tesla P40s), the speed results from tests in OpenWebUI are extremely pleasing. I don't have tokens-per-second stats yet, but I can report one query that took 3.75 minutes using CPU only on the same host hardware, took 13 seconds with a single Tesla P40 GPU behind it, and left room on the GPU's VRAM for other concurrent queries.

But our stack still doesn't do all the nice stuff that Cloudron does without a lot of extra work - mail, backups, user directory, easy multi-tenanting, app leve resource limiting, automatic DNS, automatic updates, super easy installation (our current virtual server installation guide is still ~70 active configuration steps which can't be fully automated), and more.

Finally realising that we could run vanilla Docker test on the Cloudron host without breaking Cloudron (duh!), we ran the Nvidia sample workload from our Cloudron Ubuntu host. It works. So we know our server is ready.

d2b3aea2-214f-4726-a718-731727b54ba7-cloudron-host-running-nvdia-smi-container-test.png

After initially avoiding running standalone Docker containers on our Cloudron Ubuntu host (because we didn't want to upset Cloudron), running the sample app made us realise we could run a test of OpenWebUI using vanilla Docker to test our system too... It also works.

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

time=2024-11-17T11:40:46.328Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu]"
time=2024-11-17T11:40:46.328Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-11-17T11:40:47.791Z level=INFO source=types.go:123 msg="inference compute" id=GPU-2f9f15c7-39ba-5118-38fa-07ec8a1fa088 library=cuda variant=v12 compute=6.1 driver=12.7 name="Tesla P40" total="23.9 GiB" available="23.7 GiB"
INFO :     Started server process [1]
INFO :     Waiting for application startup.
INFO :     Application startup complete.

So I'm now quite certain our only hurdle is figuring out how to make Cloudron's OpenWebUI start with GPU support. But for the life of me, as Cloudron+Docker learners, we can't figure it out, even for a non-persistent test run. Modifying run.sh didn't help, and even while running in recovery mode started from the Cloudron CLI we can't see any way to make it work with modifications to run-compose.sh or /app/pkg/start.sh or the Dockerfile or anything else.

What can we do?

Please, can I repeat my offer to provide support (if needed) to get this done? At the very least , we could offer some $$ (and I hope the community might pitch in per my original post if more was needed), testing, notes from our own installation/test challenges, and an Nvidia GPU enabled virtual dev/test machine if required.

*Side note: I think VMWare used to offer a great hypervisor even for small scale, but now it's terrible for smaller customers (in my opinion). There are real alternatives these days, but hardly any that offer point-and-click container management. So virtual server layer container management tools that are as nice as Cloudron still have relevance beyond single server and home lab use cases, we think. Have I mentioned that we love Cloudron?

robw

The file is created if it doesn't exist already, I think that command is safe to run:

robw

Hi all,

This is a Rocket.Chat question rather than a Cloudron question, so apologies for coming here, but the Rocket.Chat forum seems to be about as useful as a chocolate teapot.

Our rocketchat_apps_logs.bson file within our docker container is huge: ~266Gb. So it's hogging a lot of useful space on our Cloudron server. I've no idea what's stored in here, but it feels like it might not be critical data. Does anyone know if there's a way to reduce or truncate it, or if it's safe to remove?

Cheers,
Rob

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

robw

Posts