let's collect some metrics
-
@luckow interesting how much slower your machine is even when allocating more RAM than @micmc did. And interesting that at the same time the graphs don't seem to show much of anything
I've got a dedicated Hetzner server with a little more power still:
As described by my Cloudron dashboard at #/system:
CPU: 12 Core "AMD Ryzen 5 3600 6-Core Processor"
Memory: 67.33 GB RAM & 34.33 GB SwapI'll join in this game when I get the chance and report back...
-
@jdaviescoates Yeah, I too am testing this on a dedicated server on Hetzner. Now, that reminds me of another factor on Hetz we're also on 1 Gbps pipes which is not the case of many providers and that could also be a factor, I guess.
@luckow are your VPSs on 1 Gbps connection to outside?
-
@micmc said in let's collect some metrics:
DEDI 8 Core "Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz"
Memory 67.2 GB RAM & 34.33 GB Swap
APP RAM: 15.75 GiB
Model: llama2:latest (3.6GB)
PROMPT: Hello llama. How are you?
COMPL: blinks Hello there! scratches nose I'm feeling a bit woolly today, thank you for asking!
TIME: 22 secI tried to my test as similar to this as possible, so:
Hetzner DEDI 12 Core "AMD Ryzen 5 3600 6-Core Processor"
Memory: 67.33 GB RAM & 34.33 GB Swap
APP RAM: 15.75 GiB
Model: llama2:latest (3.6GB)
PROMPT: Hello llama. How are you?
COMPL: chomps on some grass Ummm, hello there! gives a curious glance I'm doing great, thanks for asking! stretches It's always nice to meet someone new... or in this case, someone who is talking to me. winks How about you? What brings you to this neck of the woods? Or should I say, this mountain range? Hehe!
TIME: 5 sec (for it to begin writing), didn't actually time how long it took to finish...Same prompt, answered:
chomps on some grass Ummm, hello there! gives a curious glance I'm doing great, thanks for asking! stretches It's always nice to meet someone new... or in this case, someone who is talking to me. winks How about you? What brings you to this neck of the woods? Or should I say, this mountain range? Hehe!This time took maybe 10 seconds before it started writing and have finished after about 20 seconds.
Same everything but new prompt:
Tell me a random fun fact about the Roman Empire
Answer: Ooh, that's a great question! grins Did you know that during the height of the Roman Empire (around 100-200 AD), there was a gladiator named "Draco" who was so popular that he had his own comic book series?! 🤯 It's true! The Roman Empire really knew how to entertain their citizens, huh? chuckles
TIME: started writing it after about 3 seconds, finished after about 14. Not bad!
-
@luckow I don't know enough about the hardware requirements for Ollama, but maybe it is slow because your VPS doesn't have a GPU?
I just tested it on a server with the following specs:
CPU: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
GPU: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB]
RAM: 64 GB
-
Where did you get those handy figures from @Lanhild ?
-
@Lanhild That's interesting! Of course, to run our own models on premise GPU power is required, and we've started to see AI oriented hardware servers offers lately indeed.
And, for the sake of transparency I must say that connecting with OpenAi with API key works pretty well afaics. With the same machine and config, it's as fast as being on ChatGPT.
-
Simply hover the icon
-
64GB RAM, 12 core xeon
Hello, how are you doing? 8 seconds: mistral-openorca 7b.
Hello there! I'm doing well, thanks for asking. How about yourself?
Dolphin-Mixtral 47b 23 seconds:
Hello there! I am functioning perfectly and ready to assist with any questions or tasks you may have. Please let me know how I can help you today.
-
One thing I would like to have as an option is a bell sound when the generation has completed. It helps me be productive elsewhere instead of waiting.
Oh, I would suggest overriding the initial memory allocation and ramping it up to as much RAM as you can spare.