<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Real-world minimum server specs for OpenWebUI]]></title><description><![CDATA[<p dir="auto">Q for those running OpenWebUI on a VPS with Cloudron.<br />
<strong>Just wondering what server specs your VPS has.</strong></p>
<p dir="auto">I have tried OpenWebUI twice on my Cloudron VPS which is a dedicated Hetzner box, 62Gb RAM, 1TB disk only 30% used.  But OpenWebUI runs soooo slooow.  Unusable frankly.</p>
<p dir="auto">Would love to have self-hosted AI but not currently viable for me, using other AI chat systems for now.</p>
]]></description><link>https://forum.cloudron.io/topic/13202/real-world-minimum-server-specs-for-openwebui</link><generator>RSS for Node</generator><lastBuildDate>Sun, 08 Mar 2026 14:02:45 GMT</lastBuildDate><atom:link href="https://forum.cloudron.io/topic/13202.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 29 Jan 2025 13:05:07 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Sat, 01 Feb 2025 17:45:40 GMT]]></title><description><![CDATA[<p dir="auto">I forgot an important advantage: You're supporting open source.</p>
]]></description><link>https://forum.cloudron.io/post/101065</link><guid isPermaLink="true">https://forum.cloudron.io/post/101065</guid><dc:creator><![CDATA[robw]]></dc:creator><pubDate>Sat, 01 Feb 2025 17:45:40 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Mon, 03 Feb 2025 11:09:46 GMT]]></title><description><![CDATA[<p dir="auto">Sorry, I didn't answer your original question directly...</p>
<p dir="auto">Real world server specs for OpenWebUI itself are very low. My Cloudron OpenWebUI app instance fits into a few Gb of storage, barely uses any CPU on its own, and runs in well under 1 Gb of RAM.</p>
<p dir="auto">But if you want to use the embedded ollama system to interact with locally hosted LLMs, your server needs to support the actual LLMs aswell as OpenWebUI. So you need all of this:</p>
<ul>
<li>Enough disk storage for all the models you want to use.
<ul>
<li>Individual models you can typically run locally for a reasonable cost range from 2-3 Gb (e.g. for a 3B model) up to 40-50Gb (e.g. for a 70B model). You might want to store multiple models.</li>
</ul>
</li>
<li>Enough RAM (or VRAM) to fully load the model you want to use into memory, separately for each concurrent chat.
<ul>
<li>To roughly calculate, you need the size of the model file plus some room for chat context depending on how much you want it to know/remember during chats, e.g. 3-6Gb per chat for 3-8B models, more for the bigger ones.</li>
</ul>
</li>
<li>Enough CPU (or GPU) compute power to run the model fast.
<ul>
<li>For tiny (3-8B) models, expect 1-2 minutes per chat response on a typical CPU+RAM system and don't imagine you can use bigger models at all, or seconds per chat response using GPU+VRAM. (Note: You might do better than that on the very latest CPUs, but GPU+VRAM is still going to be hundreds of times faster.)</li>
</ul>
</li>
<li>If you're using CPU+RAM (as opposed to GPU+VRAM):
<ul>
<li>You'll find that your disk I/O will be hammered (particularly during model loading) too, so you'll want very fast SSDs.</li>
<li>Expect your CPU and your RAM to be fully consumed during inference (chats), so don't expect to be running other apps on your server at the same time.</li>
</ul>
</li>
</ul>
<p dir="auto">In short, I'm not sure that a VPS hosted OpenWebUI instance running only on CPU+RAM is ever going to be useful for self hosted LLMs.</p>
<p dir="auto">Unfortunately, even if you have a GPU on your virtual server, even if you get under the hood and install your GPU drivers on the Ubuntu operating system, currently <a href="https://forum.cloudron.io/topic/12401/eta-for-gpu-support-could-we-contribute-to-help-it-along/22">Cloudron's OpenWebUI app installation won't use your GPU</a>. So on Cloudron you're stuck with CPU+RAM.</p>
<p dir="auto">But that is not as gloomy as it sounds... To answer your next question...</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/timconsidine" aria-label="Profile: timconsidine">@<bdi>timconsidine</bdi></a> said in <a href="/post/101059">Real-world minimum server specs for OpenWebUI</a>:</p>
<blockquote>
<p dir="auto">Using “out the box” with local default model.<br />
Is there any point to the app to use with publicly hosted model ?</p>
</blockquote>
<p dir="auto">Yes, there is a point. Your use cases for handling data privately are more limited, certainly, but there are some outstanding advantages to doing this, particularly on Cloudron.</p>
<ul>
<li>You're storing your data (including chats and RAG data) on a system you control.
<ul>
<li>Although you're still <em>sending your chats and data within them to the public model</em>, you at least control what you can do with the storage of your chats and data.</li>
</ul>
</li>
<li>You can download, backup, and always access your chats, or move them to a different OpenWebUI server, even if your connection to the public model is severed.</li>
<li>You can interact with multiple public and private models via a single interface, even within each chat. None of the public platforms let you talk to the others.
<ul>
<li>E.g. OpenWebUI has some pretty cool features to let you split chat threads among different models, and let models "compete" with each other using "arena" chats. We've found this to be invaluable in our business because a lot of optimizing AI usage is about experimentation and finding the best tool for the task at hand.</li>
</ul>
</li>
<li>You can install and manage your own prompt libraries, system prompts, workspaces (like "GPTs" in ChatGPT), coded tools and functions (OpenWebUI has some cool integrated Python coding capabilities in this area), in a standard way across every LLM that you interact with, and without storing your code and extended data in the public cloud.</li>
<li>You can brand your chat UI according to your company or client, and modify/integrate it in other ways. OpenWebUI is flexible and open source.</li>
<li>You can centrally connect to other apps that you self-host for various workloads including data access and agent/workflow automation without needing to upload and manage all that stuff in public systems.
<ul>
<li>E.g. some apps running on Cloudron that can give your AI interactions super powers include:
<ul>
<li>N8N for workflow automation</li>
<li>Nextcloud for data storage and management</li>
<li>Chat and notification apps</li>
<li>BI apps like Baserow and Grafana</li>
</ul>
</li>
</ul>
</li>
<li>You can manage and segregate branded multi-user access to different chats and different AIs, either in a single OpenWebUI instance, or (since app management on Cloudron is so bloody easy), different instances on different URLs.</li>
<li>In the future when you switch to self hosted LLMs or other integrations, there's little no migration. You just switch off the public API connectors and redirect them to your own models and tools, because you managed your data and chats and code integrations locally from the outset.</li>
<li>And more. I'm sure I didn't think of everything. <img src="https://forum.cloudron.io/assets/plugins/nodebb-plugin-emoji/emoji/android/1f642.png?v=c3aa4c12b7e" class="not-responsive emoji emoji-android emoji--slightly_smiling_face" style="height:23px;width:auto;vertical-align:middle" title=":)" alt="🙂" /></li>
</ul>
<p dir="auto">By the way, plenty of these advantages are either because of or enhanced by running on Cloudron. Cloudron is great. <img src="https://forum.cloudron.io/assets/plugins/nodebb-plugin-emoji/emoji/android/1f642.png?v=c3aa4c12b7e" class="not-responsive emoji emoji-android emoji--slightly_smiling_face" style="height:23px;width:auto;vertical-align:middle" title=":)" alt="🙂" /></p>
<blockquote>
<p dir="auto">I haven’t tried DeepSeek locally but might be worth a shot for privacy. I wouldn’t use it otherwise.</p>
</blockquote>
<p dir="auto">I agree with that decision wholeheartedly. Well, unless you're talking with DeepSeek about stuff that you want the whole world to know and learn from. Then, go nuts.</p>
]]></description><link>https://forum.cloudron.io/post/101064</link><guid isPermaLink="true">https://forum.cloudron.io/post/101064</guid><dc:creator><![CDATA[robw]]></dc:creator><pubDate>Mon, 03 Feb 2025 11:09:46 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Sat, 01 Feb 2025 16:28:26 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/robw" aria-label="Profile: robw">@<bdi>robw</bdi></a> thank you</p>
<p dir="auto">Using “out the box” with local default model.<br />
Is there any point to the app to use with publicly hosted model ?</p>
<p dir="auto">I haven’t tried DeepSeek locally but might be worth a shot for privacy. I wouldn’t use it otherwise.</p>
]]></description><link>https://forum.cloudron.io/post/101059</link><guid isPermaLink="true">https://forum.cloudron.io/post/101059</guid><dc:creator><![CDATA[timconsidine]]></dc:creator><pubDate>Sat, 01 Feb 2025 16:28:26 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Sat, 01 Feb 2025 16:23:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/timconsidine" aria-label="Profile: timconsidine">@<bdi>timconsidine</bdi></a> Are you trying to use locally hosted ollama models, or have you wired up API keys for the public cloud models like ChatGPT or DeepSeek(*) in your OpenWebUI instance?</p>
<p dir="auto">If you're experiencing unusable slowness for locally hosted models, it might be because OpenWebUI on Cloudron out of the box is running with CPU+RAM only (not GPU+VRAM). Even for tiny models, that's going to be very slow even with very fast CPUs.</p>
<p dir="auto">I'd be surprised if you're finding OpenWebUI to be slow with the public cloud models. There will be some latency through API calls between your Cloudron server and the online model. but I'd be surprised if you didn't find it to be nearly as fast as using the online hosted versions directly.</p>
<p dir="auto">(*) By the way, if you're using DeepSeek online and not self-hosted, please assume every interaction is being read at the other end. There are no privacy controls. And even with ChatGPT and the others, I'd suggest reading the terms and conditions of your API usage carefully and considering which jurisdiction you're sending your data and chats to.</p>
]]></description><link>https://forum.cloudron.io/post/101058</link><guid isPermaLink="true">https://forum.cloudron.io/post/101058</guid><dc:creator><![CDATA[robw]]></dc:creator><pubDate>Sat, 01 Feb 2025 16:23:44 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Thu, 30 Jan 2025 05:32:22 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/jdaviescoates" aria-label="Profile: jdaviescoates">@<bdi>jdaviescoates</bdi></a> There has been success using Deepseek reasoning engine as a controller for Qwen and other models, which optimizes the prompting via it's thought process. This one-two punch makes even lesser models provide better output.</p>
]]></description><link>https://forum.cloudron.io/post/100899</link><guid isPermaLink="true">https://forum.cloudron.io/post/100899</guid><dc:creator><![CDATA[robi]]></dc:creator><pubDate>Thu, 30 Jan 2025 05:32:22 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Wed, 29 Jan 2025 14:12:26 GMT]]></title><description><![CDATA[<p dir="auto">This is kinda very hard to assess. It depends on the expected outcome (quality and speed of produced answers) as well as the systems ability to augment with extra (and more up-to-date) sources via RAG and of course which pre-trained model and flavor of that is used.</p>
]]></description><link>https://forum.cloudron.io/post/100857</link><guid isPermaLink="true">https://forum.cloudron.io/post/100857</guid><dc:creator><![CDATA[nebulon]]></dc:creator><pubDate>Wed, 29 Jan 2025 14:12:26 GMT</pubDate></item><item><title><![CDATA[Reply to Real-world minimum server specs for OpenWebUI on Wed, 29 Jan 2025 14:10:42 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/timconsidine" aria-label="Profile: timconsidine">@<bdi>timconsidine</bdi></a> said in <a href="/post/100850">Real-world minimum server specs for OpenWebUI</a>:</p>
<blockquote>
<p dir="auto">I have tried OpenWebUI twice on my Cloudron VPS which is a dedicated Hetzner box, 62Gb RAM, 1TB disk only 30% used. But OpenWebUI runs soooo slooow. Unusable frankly.</p>
</blockquote>
<p dir="auto">I'm not using it at present but did play with it on my dedicated Hetzner server with similar specs and agree that it was pretty slow. There was quite a bit of variation in the speeds of different models though.</p>
<p dir="auto">I recently stumbled across this via a post on Mastodon which is someone testing various smaller models on a Raspberry Pi <a href="https://itsfoss.com/llms-for-raspberry-pi/" target="_blank" rel="noopener noreferrer nofollow ugc">https://itsfoss.com/llms-for-raspberry-pi/</a> and it sounds like Qwen2.5 (3b) might be work a look as something that apparently works quite well and quickly.</p>
]]></description><link>https://forum.cloudron.io/post/100856</link><guid isPermaLink="true">https://forum.cloudron.io/post/100856</guid><dc:creator><![CDATA[jdaviescoates]]></dc:creator><pubDate>Wed, 29 Jan 2025 14:10:42 GMT</pubDate></item></channel></rss>