How to use local GPU with remote LibreChat?

LoudLemur

Many of us run Cloudron on a remote VPS (Virtual Private Server) without a GPU (Graphical Processing Unit) and then deploy applications like LibreChat there.

How could we easily make use of our local hardware, which might include a graphics card, to help the inferencing on LibreChat?

Rathole has been requested for Cloudron but there are other applications which might help, too.

https://github.com/rapiz1/rathole#rathole

how about zrok or FRP?

https://zrok.io/
https://github.com/fatedier/frp

msprout

Imo, just serve Ollama on the server that has the GPU either locally (bare metal) or resource mapped (Proxmox/Virtualized), make sure to pass the server address flag in the systemd module / start script (it's in the docs; sorry, on mobile), connect both machines to a Tailscale tailnet, then configure LibreChat in the two config files to point the Ollama settings to your GPU's tailnet IP or hostname. I have found that this pathway is pretty robust. I haven't noticed any real slowdown, and my VPS and homelab are over 4,000 miles apart.

msprout

Zrok/OpenZiti looks hella cool though.

If you are committed to self-hosting, you can check out application/protocols like NetBird, HeadScale, innernet, or plain ol' vanilla WireGuard.

marcusquinn

@LoudLemur Have you checked out Kimi.com yet?

Although I guess you don't have a terrabyte in VRAM, thought you might like this post:

https://x.com/iamgrigorev/status/1944488312489570698

LoudLemur

@marcusquinn Wow! That is amazing. Thank you.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

How to use local GPU with remote LibreChat?