Old CPU / No GPU / Ollama Language Model?
-
gemma3:1b
This works. we gave it a whopping amount of RAM (32GB). You might be able to get it to run with less RAM.
qwen3:4b was too slow and hit the proxy timeout.
In the Ollama terminal you can set some environment variables to help too:
export OLLAMA_KEEP_ALIVE=24h export OLLAMA_FLASH_ATTENTION=falseOLLAMA_KEEP_ALIVE
24h
Keeps the model loaded in RAM (prevents reloading every request)
OLLAMA_FLASH_ATTENTION
false
More stable on older CPUsAfter you have Ollama running on cloudron and have its api key, you can go into the Ollama terminal and:
ollama pull gemma3:1bThen, using your own URL and your own API token, you can run this from your local machine to get gemma to tell you a joke and see if it is working:
curl -X POST "https://YOUR_REAL_OLLAMA_URL/api/chat" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_TOKEN" \ -d '{ "model": "gemma3:1b", "messages": [{"role": "user", "content": "Hello! Tell me a short joke."}], "stream": false, "options": { "num_ctx": 1024, "num_thread": 6 } }' | jqYou will hopefully see a joke in the output and maybe some smilies laughing!

-
Sadly there's currently no substitute for RAM or VRAM.
My Mac silicon chip laptop does an ok-ish job with 24Gb RAM (integrated CPU/GPU memory model)
But mostly I just accept defeat and use Ollama Cloud models (or Venice T2EE cloud models). -
I tried something like this in my 14 year old CPU
It's still writing out the joke . I also have only 16GB RAM to give. -
@robi Can it tell jokes?
I had to find out!
You be the judge:Tell us a quick joke about a chicken
Why don't chicken birds fly?

Because they are too small.
@LoudLemur of course it can and at 200 tok/sec no less.
It's just makes a lot of mistakes. Had trouble tool calling and web access
-
@LoudLemur of course it can and at 200 tok/sec no less.
It's just makes a lot of mistakes. Had trouble tool calling and web access
-
I tried something like this in my 14 year old CPU
It's still writing out the joke . I also have only 16GB RAM to give.@joseph Thanks for this story. We asked a smallish model (Qwen 9B) running on a lot of VRAM to tell us a joke.
Qwen didn't tell us a joke, it just started thinking about which chicken joke to tell us.

We looked at its thinking and it had created and considered over 200 chicken jokes before we decided the best thing to do was ... not wait for it!
We wish we had left it running to find out which joke it would have eventually chosen for us!
-
Why did the AI engine search for a chicken joke ?
Because it was looking for poultry in motion!

-
Why did the AI engine search for a chicken joke ?
Because it was looking for poultry in motion!

-
@andreasdueren Thank you for Hermes! It is a great choice for us and it also tells a funny chicken joke!
hermes-4.3-36b
Sure! Here's a clucktastic one:
Why did the chicken join a band?
To learn how to make some "eggcellent" beats!


(If you want more, just say the word!)
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login

