Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. OpenWebUI
  3. The models load in the void

The models load in the void

Scheduled Pinned Locked Moved OpenWebUI
6 Posts 4 Posters 958 Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    Dont-Worry
    wrote on last edited by Dont-Worry
    #1

    I don't know if it's a real bug or not, in doubt I prefer to talk about it here. I downloaded three different models and this error occurs on all models. I have 100GB of RAM allocated to OpenWebUI And my infrastructure has 12 cores. The fact is that even when I restart the application, etc. Sometimes it works, but very often the IA loads in the void leaving me on this page indefinitely :

    3c957fb8-29f8-46f4-b664-a7e8856d8b89-image.png

    I don't know if it's normal or not but OpenWebUI doesn't use any RAM at all, it only uses my CPU. If I believe my htop, the application uses less than 1GB of RAM when it is used.

    Another thing, always if I believe my htop, even if I leave the application loading indefinitely and therefore consequently after a while I close either the chat or the tab, but cores continues to be used a lot.

    At some point, I wanted to try the Llama2 70B model and I let it run for about 40 or even 50 minutes. At some point, I closed the page or refreshed it, I don't remember. And the answer appeared like magic. The model had not been written in my eyes, but it was as if the interface had not been updated and had continued to show me the load. But the answer had already been written and when I refreshed it, it appeared.

    The problem is that the Llama2 70B was probably too big for my infrastructure, but the 7B's... I don't understand why they put so much time to charge. Knowing that sometimes they charge almost instantly.

    1 Reply Last reply
    1
    • girishG Offline
      girishG Offline
      girish
      Staff
      wrote on last edited by
      #2

      Good questions, we don't know how these models perform on these servers either. I am still hunting for a server where they perform reasonably. Will report if I find out. I think GPU support in Cloudron will also help, but this will take a bit.

      1 Reply Last reply
      3
      • D Offline
        D Offline
        Dont-Worry
        wrote on last edited by
        #3

        Okay @girish thanks for your answer. If you find a server where they perform it might interest me a lot.

        1 Reply Last reply
        0
        • J Offline
          J Offline
          JLX89
          wrote on last edited by
          #4

          Out of curiosity, based on current usage -- I know locally storing the models takes up a lot of space and that can be overcome by using volumes. I'm having the same issue where when receiving responses, it's loading like I'm back on dial-up internet. There doesn't seem to be enough logging or metrics, so it seems we're limited in trying to see what the cause is. My assumption is that I should be looking at CPU usage?

          jdaviescoatesJ 1 Reply Last reply
          0
          • J JLX89

            Out of curiosity, based on current usage -- I know locally storing the models takes up a lot of space and that can be overcome by using volumes. I'm having the same issue where when receiving responses, it's loading like I'm back on dial-up internet. There doesn't seem to be enough logging or metrics, so it seems we're limited in trying to see what the cause is. My assumption is that I should be looking at CPU usage?

            jdaviescoatesJ Offline
            jdaviescoatesJ Offline
            jdaviescoates
            wrote on last edited by
            #5

            @JLX89 @Dont-Worry have a look at this thread

            https://forum.cloudron.io/post/85791

            I use Cloudron with Gandi & Hetzner

            1 Reply Last reply
            0
            • girishG Offline
              girishG Offline
              girish
              Staff
              wrote on last edited by
              #6

              Just saw this post - LLaMA Now Goes Faster on CPUs

              1 Reply Last reply
              0
              • R robw referenced this topic on
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • Bookmarks
              • Search