Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
R

robw

@robw
About
Posts
50
Topics
9
Shares
0
Groups
0
Followers
0
Following
0

Posts

Recent Best Controversial

  • Real-world minimum server specs for OpenWebUI
    R robw

    Sorry, I didn't answer your original question directly...

    Real world server specs for OpenWebUI itself are very low. My Cloudron OpenWebUI app instance fits into a few Gb of storage, barely uses any CPU on its own, and runs in well under 1 Gb of RAM.

    But if you want to use the embedded ollama system to interact with locally hosted LLMs, your server needs to support the actual LLMs aswell as OpenWebUI. So you need all of this:

    • Enough disk storage for all the models you want to use.
      • Individual models you can typically run locally for a reasonable cost range from 2-3 Gb (e.g. for a 3B model) up to 40-50Gb (e.g. for a 70B model). You might want to store multiple models.
    • Enough RAM (or VRAM) to fully load the model you want to use into memory, separately for each concurrent chat.
      • To roughly calculate, you need the size of the model file plus some room for chat context depending on how much you want it to know/remember during chats, e.g. 3-6Gb per chat for 3-8B models, more for the bigger ones.
    • Enough CPU (or GPU) compute power to run the model fast.
      • For tiny (3-8B) models, expect 1-2 minutes per chat response on a typical CPU+RAM system and don't imagine you can use bigger models at all, or seconds per chat response using GPU+VRAM. (Note: You might do better than that on the very latest CPUs, but GPU+VRAM is still going to be hundreds of times faster.)
    • If you're using CPU+RAM (as opposed to GPU+VRAM):
      • You'll find that your disk I/O will be hammered (particularly during model loading) too, so you'll want very fast SSDs.
      • Expect your CPU and your RAM to be fully consumed during inference (chats), so don't expect to be running other apps on your server at the same time.

    In short, I'm not sure that a VPS hosted OpenWebUI instance running only on CPU+RAM is ever going to be useful for self hosted LLMs.

    Unfortunately, even if you have a GPU on your virtual server, even if you get under the hood and install your GPU drivers on the Ubuntu operating system, currently Cloudron's OpenWebUI app installation won't use your GPU. So on Cloudron you're stuck with CPU+RAM.

    But that is not as gloomy as it sounds... To answer your next question...

    @timconsidine said in Real-world minimum server specs for OpenWebUI:

    Using β€œout the box” with local default model.
    Is there any point to the app to use with publicly hosted model ?

    Yes, there is a point. Your use cases for handling data privately are more limited, certainly, but there are some outstanding advantages to doing this, particularly on Cloudron.

    • You're storing your data (including chats and RAG data) on a system you control.
      • Although you're still sending your chats and data within them to the public model, you at least control what you can do with the storage of your chats and data.
    • You can download, backup, and always access your chats, or move them to a different OpenWebUI server, even if your connection to the public model is severed.
    • You can interact with multiple public and private models via a single interface, even within each chat. None of the public platforms let you talk to the others.
      • E.g. OpenWebUI has some pretty cool features to let you split chat threads among different models, and let models "compete" with each other using "arena" chats. We've found this to be invaluable in our business because a lot of optimizing AI usage is about experimentation and finding the best tool for the task at hand.
    • You can install and manage your own prompt libraries, system prompts, workspaces (like "GPTs" in ChatGPT), coded tools and functions (OpenWebUI has some cool integrated Python coding capabilities in this area), in a standard way across every LLM that you interact with, and without storing your code and extended data in the public cloud.
    • You can brand your chat UI according to your company or client, and modify/integrate it in other ways. OpenWebUI is flexible and open source.
    • You can centrally connect to other apps that you self-host for various workloads including data access and agent/workflow automation without needing to upload and manage all that stuff in public systems.
      • E.g. some apps running on Cloudron that can give your AI interactions super powers include:
        • N8N for workflow automation
        • Nextcloud for data storage and management
        • Chat and notification apps
        • BI apps like Baserow and Grafana
    • You can manage and segregate branded multi-user access to different chats and different AIs, either in a single OpenWebUI instance, or (since app management on Cloudron is so bloody easy), different instances on different URLs.
    • In the future when you switch to self hosted LLMs or other integrations, there's little no migration. You just switch off the public API connectors and redirect them to your own models and tools, because you managed your data and chats and code integrations locally from the outset.
    • And more. I'm sure I didn't think of everything. πŸ™‚

    By the way, plenty of these advantages are either because of or enhanced by running on Cloudron. Cloudron is great. πŸ™‚

    I haven’t tried DeepSeek locally but might be worth a shot for privacy. I wouldn’t use it otherwise.

    I agree with that decision wholeheartedly. Well, unless you're talking with DeepSeek about stuff that you want the whole world to know and learn from. Then, go nuts.

    OpenWebUI

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    Hiya!

    This is a question for the Cloudron team directly, and I'm happy to get a direct response if the forum isn't the best place for it, but I'll post it publicly in case anyone else may wish to contribute.

    In this thread @girish mentioned that GPU support in Cloudron will take 'some time' - https://forum.cloudron.io/topic/11312/the-models-load-in-the-void/6.

    I understand completely that such things are difficult to quantify, and I don't know your current development roadmap. For instance, I doubt GPU support is very useful for most of the apps on Cloudron, and I'm sure that supporting all the apps rather than just one is your core business.

    My questions are:

    • Approximately how long is 'some time'? (It's okay if you can't answer that accurately or at all, I understand.)
    • Is there anything we can do to contribute to speeding it up? This has real commercial value to us, and if $$ can help then that is one thing we may be able to contribute if we can understand how many $$ might help. πŸ™‚
    • For the community - Would it be useful enough for anyone else to help crowd-fund?

    Bringing OpenWebUI to Cloudron - and perhaps soon other AI tools - is very exciting for us and multiple of our clients. AI tools are not easy to host privately in a cost effective, secure and commercial way. (I mean not just the pseudo-privacy of a 'private' ChatGPT space where we send our most sensitive data outside our firewall and even worse outside our own international jurisdiction, and then rely on an international corporate's legal T&Cs, or otherwise invest in AWS or Azure infrastructure for at least twice the price of doing it ourselves.)

    Cloudron gives us solutions and approaches to many of the moving parts needed to provide truly private AI access to different clients and business units that we have to handle individually otherwise - e.g. easy deployment, built in no-stress backups, simple multi-tenanting, point-and-click resource management, flexible DNS, mail, etc. (It would be even nicer if we got end-to-end encrypted CI/CD pipelines across multiple networks and other related fancy stuff out of the box, but that is a dream only imagined because Cloudron is already great. πŸ˜‰ ) But I don't need to sell Cloudron's benefits to this group, I'm sure, just noting they're great in relation to OpenWebUI. We are big fans of Cloudron, by the way!

    So... Since GPU support is a very real dependency for a properly usable OpenWebUI, we'd like to help everyone get it ASAP.

    OpenWebUI

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    @Lanhild said in ETA for GPU support? Could we contribute to help it along?:

    A lot of companies that might deploy Cloudron for its ease of life features don't necessarily have a VPS with a GPU.

    Also, (might help you to deepen your Cloudron knowledge) Cloudron packages usually are only one component/application.

    Moreover, OpenWebUI is "just" a UI that supports connections to Ollama and isn't affiliated with it. Meaning that Ollama isn't a dependency of it at all.

    Excellent points @Lanhild - you've convinced me. πŸ™‚

    And there are benefits on the Ollama side too. I would appreciate the benefit in using Cloudron to keep our Ollama installation automatically up to date on its own, for instance.

    In fact, given our remaining inability to modify the existing Cloudron OpenWebUI app to run with our GPUs, for our small clients we are now thinking this way - I.e. using Cloudron just for the OpenWebUI component and letting them connect to our separately hosted Ollama. It's a bit less convenient than we were hoping, but at least we'll still have segregated data and user management for each client in OpenWebUI.

    So now, I also want a Cloudron OpenWebUI app that does not come with bundled Ollama, so that I can be sure these customers don't hammer our CPUs and get frustrated by a slow user experiences. πŸ™‚

    OpenWebUI

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    Thinking further on this... Perhaps it's not genuine Cloudron GPU support that we need.

    Reading my original question about GPU support in a Cloudron context, I suppose it's easy to assume that there's an expectation of containerized resource management just like we get in all our Cloudron apps - the ability to segregate and limit GPU and attached VRAM just like we can for CPU, RAM, etc. While that would certainly be wonderful, it's not actually what we need for our business case. We just need the ability for OpenWebUI to draw on the hardware GPU resources we attach to its server (which is a virtual machine in our case), and run separate OpenWebUI apps easily with separate logins and datastores (which you've already given us).

    Which OpenWebUI already appears to do, I suppose. If the container is started with the GPU usage switch, my understanding is that's all that's needed in the basic case. Please correct me if I'm wrong:
    E.g. docker run -d -p 3000:8080 --gpus all etc...
    https://docs.openwebui.com/getting-started/

    So my new questions are these (without having tried anything out yet):

    • If we installed the NVIDIA Ubuntu GPU drivers in the Cloudron OS and started the OpenWebUI container with the GPU switch, would it just work?

    • Does installing the GPU drivers interfere with Cloudron's upgrade processes? (In our case, we wouldn't mind having to manage the GPU drivers separately, we don't expect Cloudron to manage non-native additions.)

    • If we got past those first two questions, could we get a simple switch on the OpenWebUI Cloudron container to enable GPU support rather than having to set up our own custom container?

    Cloudron isn't our virtualisation layer, we use VMWare in our case. So we can target our GPU usage to a Cloudron installation, we don't need to go further and manage it at the app level. (Actually making GPUs work in VMWare is a lot harder than it sounds, we found, but that's a separate problem.)

    Of course I know nothing of how Cloudron is managing and segregating resources across its docker containers. Perhaps it is wishful thinking. But to be clear, we don't need fancy app level virtual GPU handling, we just want to use our GPU in one (or any) of the running apps. So I'm crossing fingers for good answers to the above questions. πŸ™‚

    (On a separate but interesting note, we've been hoping to do some Cloudron experimentation on this front. And they're not even hard experiments that we could do on a bare metal box, but we've been trying to do it in our data centre hosting environment on our existing systems as a proper proof of concept... However it's been a helluva job spinning up a pilot project with some older Dell servers we have running in a data centre with VMWare just to the point of being able to run GPUs at all, let alone getting as far as experimenting with Cloudron containers.

    • We got some Tesla P40s running in some Dell r720xd's for our pilot project. Sourcing the right hardware including cards and power cables was hard enough.
    • Then we didn't know what would fit where in which servers because none of it is officially supported or clearly documented.
    • Then we found we had to upgrade our power supplies even though we thought the standard ones worked on paper.
    • Then the hoops you have to jump through with both server firmware and BIOS/boot settings, and virtual machine BIOS and boot settings, and then VMWare updates and driver installations are CRAZY...
    • We've finally got as far as making these things operational only to find that our VMWare Essentials licence doesn't provide virtual GPU support. This fact doesn't seem to be clearly written anywhere in the 1,000 documents we've read lately until you know exactly what to look for, and since BroadCom's recent purchase of VMWare the affordable licences we need seem to have disappeared.

    So we still haven't booted a VMWare based Cloudron virtual machine in a data centre with an enterprise GPU in it yet... Our theory that low cost, flexibly hosted, easily backed up, privately hosted AI agents and learning data repositories should be available to smaller enterprises who care about privacy/security and cost enough to avoid the public clouds has encountered many challenges so far. However I think we're not far off - though for licensing reasons I think we might need to re-learn everything we know using a different hypervisor before the end - and I'd like to report back about how this all goes running on Cloudron in the very near future.)

    OpenWebUI

  • Real-world minimum server specs for OpenWebUI
    R robw

    I forgot an important advantage: You're supporting open source.

    OpenWebUI

  • Trouble importing mongodb data to Rocket.Chat
    R robw

    We're attempting to migrate our Rockat.Chat instance from an Ubuntu Snap installation onto our Cloudron instance. Following the instructions at https://cloudron.io/documentation/guides/import-mongodb/ (we used the backupdb command from https://docs.rocket.chat/installation/snaps to get our Mongo DB backup as it uses mongodump under the hood) almost works but we ran into some roadblocks:

    1. The first time we attempted a restore it resulted in an error at the end of the import. Sorry I don't have the exact error as it disappeared from the terminal window before I copied it. It was something similar to "failed: Could not import index bio_1 because it already exists with different options".

    Turning the app back on, it spins up correctly and we are able to login. Everything seems to work except that the most recent chat messages are from over a year ago. The chat channels are correct, users are correct, everything else seems to be correct - although we didn't run extensive tests.

    1. Due to the previous error, we tried running the mongorestore command using the "--drop" option. We thought that might get around the previous error. It seemed to work, but now we receive this error instead:

    2020-08-07T01:23:03.863+0000 Failed: d1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry: error creating indexes for d1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry: cannot restore index with namespace 'd1c4b380-6f69-46f9-8517-609de87b8407.rocketchat_livechat_inquiry.$queueOrder_1_estimatedWaitingTimeQueue_1_estimatedServiceTimeAt_1': namespace is too long (max size is 127 bytes)

    After turning the app back on, we get a similar result to above. Everything seems to work except that the most recent messages are from over a year ago (but a different date this time).

    Doing some research, it appears that Mongo DB 3.6.3 has a hard limit on the index namespace size that we can't get around:

    https://docs.mongodb.com/manual/reference/limits/

    Perhaps the BSON file size limit is also affecting us here. Our rocketchat_uploads.chunks.bson file in the Mongo backup is > 400 Mb. But we don't know.

    So my question is... Is there a way to upgrade the Mongo DB instance in the Cloudron Addon to a newer version (4.2+) that doesn't have some of these limits? Or can you suggest another way to import our data?

    It's important to some of our stakeholders that we maintain the chat history in our migration.

    Rocket.Chat mongodb import

  • Hosting this is insanely expensive
    R robw

    @d19dotca @RazielKanos Matomo has heatmaps and session recording but it is a paid extra plugin if you're self hosting (e.g. Cloudron). It's included in the cloud hosted version.
    https://plugins.matomo.org/HeatmapSessionRecording

    As a long time self hosted Matomo user, I can report that as an open source system, it is an outstanding enterprise platform with barely any other competing options in its class. A true Google Analytics replacement option. In fact you can't even compare it directly to GA if you look at the extended plugins: it's more feature rich. Matomo's APIs are fantastic if you want to customize your tracking capabilities: you can do a lot.

    But for advanced self hosted use, it's not for the light hearted. Running anything other than a standard installation with a few small to medium sites (e.g. to guess, up to 1 million monthly visits/actions across the platform) takes real effort. Optimized management, stability, and speed incurs a learning curve and significant time. You need more than basic compute resources. Updates need direct attention. And there is non-trivial cost involved if you want the advanced features. (These aren't Matomo weaknesses, just side effects of running an advanced system that collects huge amounts of data. The effort is worthwhile IMO.)

    For simple self hosted out-of-the-box use with medium or small sized sites, it's quick and easy, yet still very feature rich. And you probably won't run into the management overheads I outlined above.

    Re: self hosted versus cloud hosted, there are important advantages to self hosting including direct data access, complete data sovereignty (it's all yours, on your server), host in your own region, no user or website or feature limits, white labelling, complete flexibility, host the way you want (even on Windows servers) and manage your own security, and more. Perhaps these advantages might mean more to enterprise customers and agencies than smaller businesses. Oh, then there's the big one: it's free (the base system)! πŸ™‚

    I admire Matomo for servicing enterprise customers very well while still coming up with a very competitive pricing model for smaller businesses (cloud hosting) and yet still catering to the free / open source world without compromising their core system in any way. They behave exactly like an open source company should!

    By the way, there's a premium bundle that includes all the advanced features and is affordable for serious smaller businesses or agencies who self host. We haven't found anything else that matches this advanced level of capability anywhere near this price range. The link isn't very obvious on the website: https://plugins.matomo.org/PremiumBundle

    Matomo's inclusion in Cloudron is fantastic. Not only does the combination offer hassle=free management, but simple and free / low cost development/staging instances for testing and integration projects too: something we struggled to achieve before.

    From limited previous (possibly outdated) experience, I believe Open Web Analytics is comparable to base Matomo but without the extensibility options. I believe Umami is simpler / more basic by comparison, but easy and quite beautiful, so that might suit some people well.

    This is all IMHO based on my own small agency experience of course. YMMV. I don't work for Matomo, just a long time user (since Piwik days).

    Matomo

  • The rocketchat_apps_logs.bson collection is huge, can we reduce or remove it?
    R robw

    Thanks @joseph - I will try that!

    Meanwhile, a small update: The disk filling process appears to have magically stopped. I don't know when. So our Rocket.Chat appears to be back to normal operation. I don't see any obvious clues in the package updates, although there are updates to the apps engine, maybe it was that. Our instance is backing up and updating happily again.

    It wasn't me! πŸ™‚

    Rocket.Chat

  • Launch OpebWebUI with GPU support
    R robw

    Per this thread, I think it's possible to add Nvidia GPU support to a Cloudron server without impacting Cloudron (or at least not breaking it).

    So I'd like to be able to launch OpenWebUI with GPU support.

    I think that adding the --gpus=all switch and the :cuda tag to the image name to the startup options might be all that's required. (Assuming the Ubuntu host has the Nvidia driver and CUDA and the Docker CUDA toolkit properly installed, I think it'll work, and if not, I believe it'll fall back to using CPU.)

    Hard coding this would be enough for us, and I don't think it would break for anyone who doesn't have a GPU. The only people it might not suit are people who have a GPU running but don't want to use it. So adding a configuration switch of some kind to turn it on or off would be even nicer.

    Feature Requests

  • Can't start upgrade from Ubuntu 20.04 to Ubuntu 22.0x because of "no development version of an LTS available" error
    R robw

    I might see if I can get an official comment from the support team first... Ad hoc experimentation with live servers that diverges from official documentation isn't really something we can do. πŸ™‚

    I love Cloudron's close coupling with the public facing DNS and abstraction of it into a point-and-click GUI, because it makes day to day operation very smooth and easy. It just works, which is the beauty of the whole platform. But it does mean we can't simply clone the server to create a fully loaded sandbox for experimentation with real configuration data. Well, not unless we want to either turn the production server off while we're experimenting (not really an option), or configure a separate set of DNS entries and reconfigure the clone, by which time it's already a lot of work and it's arguably custom enough that it's not a reliable experiment for things like this anyway. (By the way, we'd like to set up cloned test environments for each CR server we run, but we're not resourced for that yet.)

    We could spin up a new Cloudron on this version and play with that, Cloudron is also great for that because it's so easy and open and free. But that's not really a proper test for this case either. If the support team want that -d option for future compatibility of some kind, all I proved with my test is that I can break my server in the future without knowing about it now by not following the documentation. πŸ™‚

    So if we need to proceed without a proper test environment, ideally I'd like to do it according to the documentation and formal support advice or at least someone else's experience with logic to back it up.

    Thanks for the suggestion though, appreciate it. It just triggered a few thoughts that I typed out. πŸ™‚

    Support upgrade ubuntu

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    @LoudLemur said in ETA for GPU support? Could we contribute to help it along?:

    @girish said in ETA for GPU support? Could we contribute to help it along?:

    I think the hard part is the GPU support in docker is varied.

    From Arya:

    "As of 2023, GPU support in Docker, particularly for AI applications, has made significant strides but still faces challenges. The main issue is that Docker was originally designed for CPU-based applications and struggled to efficiently utilize GPU resources. ...

    Thanks for that info, I wasn't aware of that challenge but it certainly makes sense.

    The Open WebUI installation page talks about running the Docker image with GPU support but doesn't mention those problems: https://docs.openwebui.com/getting-started/

    OpenWebUI

  • CUDA not permitted
    R robw

    The file is created if it doesn't exist already, I think that command is safe to run:

    image.png

    Jellyfin

  • Launch OpebWebUI with GPU support
    R robw

    I am not a docker or Linux expert (only operational knowledge), but my GPU is running on the Cloudron server (a virtual machine in my case). Happy to share insights outside of this thread if you'd like (although not sure I can help), but I don't believe this is related to the feature request so to avoid confusing everyone, I don't think we should discuss it here.

    Feature Requests

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    There is progress to report...

    @Lanhild said in ETA for GPU support? Could we contribute to help it along?:

    @robw

    If we installed the NVIDIA Ubuntu GPU drivers in the Cloudron OS and started the OpenWebUI container with the GPU switch, would it just work?

    Not necessarily, it depends on the GPU.

    Per above, we have some Nvidia Tesla P40s in our proof of concept environment.

    The use case introduced in this thread is based on a desire to make low cost, truly private AI workloads accessible to small and medium business, hopefully using Cloudron as a management tool for OpenWebUI containers in particular, because Cloudron is so easy and nice. (An easy way to run AI RAG and even just vanilla inference even on low parameter LLMs will be invaluable to many businesses.)

    In this use case, I don't believe there is a need to support consumer GPUs - I understand that would be an endless tail-chasing exercise.

    Thanks to Nvidia's current near-monopoly in the server GPU space, I believe there is only a fairly small number of enterprise grade GPUs that that are likely to be used in a lot of real world scenarios. Though I don't claim to be an expert in this area or have any quotable evidence, my understanding from everything (a lot) that we've read and tested ourselves is that for the Nvidia GPUs, these all run with the same core Nvidia drivers and CUDA toolkit. If other hosting providers or businesses are anything like us, I guess they'll avoid non-standard hardware and software environments and frameworks as much as possible. By which I mean to suggest, officially supporting only a few general/wide/mainstream/standard conditions is likely to be very useful to a significant number of Cloudron users, even if we can't support everyone.

    While it may seem like a good idea, results will be very random. Also, nouveau (or whatever they're called now) drivers are the worst available out there. I've only had good results with nvidia official drivers.

    Cloudron installs in a 'fresh Ubuntu' server installation, so it appears the nouveau drivers are not installed, so there's no need to install them or worry about them in our case, or a general Nvidia support case I think.

    Does installing the GPU drivers interfere with Cloudron's upgrade processes? (In our case, we wouldn't mind having to manage the GPU drivers separately, we don't expect Cloudron to manage non-native additions.)

    Yes. Nvidia drivers are a pain to manage and often need debugging.

    We are not Linux experts and YMMV of course, but we have got our GPU up and running inside our Cloudron/Ubuntu virtual machine along with CUDA toolkit installed, and ultimately we didn't find this very difficult once we understood what to do. In the end we only ran a few standard installation commands.

    fe71d18e-1a9a-46e2-896d-dcbf7eaf3d54-image.png

    There was an apt update command in the middle of the process that I suppose is going to cause some grief for Cloudron. (Since we're not Linux experts we didn't know how to only update the components needed for our drivers and not everything else. But I also note, we did not run apt upgrade.) But otherwise from what we can tell, the Nvidia drivers + CUDA software combination appears to be quite independent of anything connected to Cloudron. (For the moment we've disabled Cloudron automatic updates.)

    The much harder part was making everything work at the server hardware (Dell servers) and hypervisor (VMWare) level. I'm happy to say that we now have this working and know how to make it work again. Although we are not able to fully virtualise our GPUs with our current VMWare licence, we don't really want to do this anyway, and we've got the GPU running at the VM level using PCI Passthrough. There were plenty of high hurdles to jump to get there, so if anyone needs any pointers on that front, feel free to reach out (although this part is highly dependent on the hardware and hypervisor combination).

    So, we are not quite up to testing OpenWebUI on Cloudron VM with a GPU running yet... Now we need to figure out how to start OpenWebUI inside Cloudron with GPU support.

    OpenWebUI

  • ETA for GPU support? Could we contribute to help it along?
    R robw

    Ok some further updates...

    TL;DR - We still need help getting Cloudron's OpenWebUI container to start with GPU enabled. This is our bottlneck. Otherwise everything else works.

    Now that we've figured out how to make our Dell servers and VMWare hypervisor* reliably support GPU all the way through to virtual machines, but failing to get Cloudron apps working with GPU, we have been looking for low cost 'hosting ready' virtual server alternatives for OpenWebUI. (Windows Server hosting that we like for a lot of other workloads is not a preferred option in this case.) At the VM level we've now got an Ubuntu/Caddy/Webmin (or Cockpit)/Docker/NVIDIA+CUDA stack fully operational with OpenWebUI/Ollama, and it's arguably a commercially viable solution.

    And I must say, now that it's running in a 'hosting ready' environment with a software stack that's very similar to what Cloudron offers, even with our older-generation GPU test platform (Tesla P40s), the speed results from tests in OpenWebUI are extremely pleasing. I don't have tokens-per-second stats yet, but I can report one query that took 3.75 minutes using CPU only on the same host hardware, took 13 seconds with a single Tesla P40 GPU behind it, and left room on the GPU's VRAM for other concurrent queries.

    But our stack still doesn't do all the nice stuff that Cloudron does without a lot of extra work - mail, backups, user directory, easy multi-tenanting, app leve resource limiting, automatic DNS, automatic updates, super easy installation (our current virtual server installation guide is still ~70 active configuration steps which can't be fully automated), and more.

    Finally realising that we could run vanilla Docker test on the Cloudron host without breaking Cloudron (duh!), we ran the Nvidia sample workload from our Cloudron Ubuntu host. It works. So we know our server is ready.

    d2b3aea2-214f-4726-a718-731727b54ba7-cloudron-host-running-nvdia-smi-container-test.png

    After initially avoiding running standalone Docker containers on our Cloudron Ubuntu host (because we didn't want to upset Cloudron), running the sample app made us realise we could run a test of OpenWebUI using vanilla Docker to test our system too... It also works.

    docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

    time=2024-11-17T11:40:46.328Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu]"
    time=2024-11-17T11:40:46.328Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
    time=2024-11-17T11:40:47.791Z level=INFO source=types.go:123 msg="inference compute" id=GPU-2f9f15c7-39ba-5118-38fa-07ec8a1fa088 library=cuda variant=v12 compute=6.1 driver=12.7 name="Tesla P40" total="23.9 GiB" available="23.7 GiB"
    INFO :     Started server process [1]
    INFO :     Waiting for application startup.
    INFO :     Application startup complete.
    

    So I'm now quite certain our only hurdle is figuring out how to make Cloudron's OpenWebUI start with GPU support. But for the life of me, as Cloudron+Docker learners, we can't figure it out, even for a non-persistent test run. Modifying run.sh didn't help, and even while running in recovery mode started from the Cloudron CLI we can't see any way to make it work with modifications to run-compose.sh or /app/pkg/start.sh or the Dockerfile or anything else.

    What can we do?

    Please, can I repeat my offer to provide support (if needed) to get this done? At the very least , we could offer some $$ (and I hope the community might pitch in per my original post if more was needed), testing, notes from our own installation/test challenges, and an Nvidia GPU enabled virtual dev/test machine if required.

    *Side note: I think VMWare used to offer a great hypervisor even for small scale, but now it's terrible for smaller customers (in my opinion). There are real alternatives these days, but hardly any that offer point-and-click container management. So virtual server layer container management tools that are as nice as Cloudron still have relevance beyond single server and home lab use cases, we think. Have I mentioned that we love Cloudron? πŸ™‚

    OpenWebUI

  • LiteLLM removed from OpenWebUI, requires own separate container
    R robw

    This would certainly be a very valuable addition to Cloudron. I have upvoted. That said, I guess that the configuration process could be challenging since there doesn't appear to be any point-and-click UI for that.

    OpenWebUI

  • The rocketchat_apps_logs.bson collection is huge, can we reduce or remove it?
    R robw

    Hi all,

    This is a Rocket.Chat question rather than a Cloudron question, so apologies for coming here, but the Rocket.Chat forum seems to be about as useful as a chocolate teapot.

    Our rocketchat_apps_logs.bson file within our docker container is huge: ~266Gb. So it's hogging a lot of useful space on our Cloudron server. I've no idea what's stored in here, but it feels like it might not be critical data. Does anyone know if there's a way to reduce or truncate it, or if it's safe to remove?

    Cheers,
    Rob

    Rocket.Chat

  • The rocketchat_apps_logs.bson collection is huge, can we reduce or remove it?
    R robw

    Oh, the problem is not gone.

    image.png

    image.png

    I'm not sure when that happened, Rocket.Chat had stopped working again this morning when I logged in so that's when I checked. So I assume it's Rocket.Chat again. Since the app wasn't working the logs are full of errors, so I couldn't easily see anything useful there. Since the Rocket.Chat container's file system changed in the last update, I can't seem to see the .bson files any more so I can't confirm that was the problem and I'm not sure what else to check.

    There were some other automatic app updates over the last two nights (though not Rocket.Chat because it's up to date), so I suppose it's possibly not Rocket.Chat this time, I'm just assuming.

    I can't see that data anywhere on the Cloudron disk from the server console. A du -h --max-depth=1 at the root level on the server shows Cloudron is using in expected the amount of space taken up by the apps in the coloured parts of the bar above even though df says the disk is full. The 'everything else' data doesn't seem to exist. So I can't tell what the data is. It appears to be living in the overlay filesystem until a Cloudron reboot, at which time it disappears. (Sorry I didn't take a screenshot.) I know nothing of these virtual file systems, so apologies if I'm explaining that wrong, just reporting what I can see.

    So after a reboot it's all good again. After 30 minutes or so, there's no evidence so far that the disk is filling up or anything is changing on the disk yet:

    99f01949-d7cc-412f-8394-57c3c2240aeb-image.png
    1bb11ae3-c03b-49ec-a52b-3cc2d49146e0-image.png

    Rocket.Chat

  • The rocketchat_apps_logs.bson collection is huge, can we reduce or remove it?
    R robw

    @stoccafisso I can confirm that for me this is Rocket.Chat flakiness, not Cloudron. I fear your restoration process won't help if Rocket.Chat is just going to kick off its log filling process again.

    @girish I haven't yet tried moving logs to the filesystem, because (very strangely) my Rocket.Chat settings page currently appears to be empty, just a search box:

    6a705bac-8cc0-4a93-9388-5336b1f80132-image.png

    That's a Rocket.Chat mystery I haven't figured out yet. πŸ™‚

    I did try the scheduled mongodb log clearing idea directly in the app container, but it didn't seem to help. Because I can't see what's actually happening in the database, I don't know if that's because the data is filling up faster than the query runs, or because of database locking because the database is so big (100s of Gb), or something to do with virtual container file system strangeness, or something else.

    At this stage my server seems to have fallen into this pattern:

    • Rocket.Chat fills the disk
    • Rocket.Chat attempts an update (I have auto updates on)
    • Update fails because the backup fails (no disk space to prepare the backup, even though my backups go offsite to Backblaze)
    • Rocket.Chat restarts, at which point the "filled" disk space moves from within the Rocket.Chat container to the virtual file system (reported as "everything else" on the Cloudron System Info page)
    • A reboot releases the disk space from the virtual file system
    • Start over...

    The disk filling doesn't seem to start immediately any more. I don't know what triggers it.

    So that means with a reboot every day or so, the server is more or less operational for general usage, except for a while right at the end when the disk is full.

    I'm hoping a Rocket.Chat update will arrive soon that makes this go away.

    Rocket.Chat

  • The rocketchat_apps_logs.bson collection is huge, can we reduce or remove it?
    R robw

    Quick update: Upgrade to package 2.55.0 with Rocket.Chat server 7.3.0 did not fix the problem (though I didn't think it would based on the RC release notes).

    Attempts to get help on the Rocket.Chat community forum appear to have failed because I encountered the rudest and most self centred and unhelpful "Community Liaison Officer" I've ever had the misfortune to meet, a genuinely toxic individual. However, my glass is half full: the silver lining is that it reinforced my sincere appreciation of the community in this forum and the Cloudron team. πŸ™‚

    Rocket.Chat
  • Login

  • Don't have an account? Register

  • Login or register to search.
  • First post
    Last post
0
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search