Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Discuss
  3. How to run local Llama 3.1 405b with RAG with Cloudron?

How to run local Llama 3.1 405b with RAG with Cloudron?

Scheduled Pinned Locked Moved Discuss
a.ir.a.gllama 3.1
6 Posts 4 Posters 1.3k Views 5 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      L Offline
      LoudLemur
      wrote on last edited by LoudLemur
      #1

      Llama 3.1 405b is a recently released, Free software Large Language model which is highly capable and setting the standard on leaderboards. RAG (Retrieval Augmented Generation) is where an AI agent is used by the language model to "go out onto the internet", find the appropriate, current data and then use that to help formulate a response to a query.

      Because Llama 3.1 405b is very large, expensive hardware is needed to generate timely responses to queries.

      Does anybody have suggestions about how to setup and run a llama 3.1 405b model with RAG, perhaps using Cloudron? Where would you host it? If it were for private use and you were not expecting to run queries 24/7, where would you run it?

      https://blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide/

      D 1 Reply Last reply
      1
      • C Offline
        C Offline
        crazybrad
        wrote on last edited by
        #2

        @LoudLemur Of course the answer depends on how much money you are willing to spend per month. I would probably wait until someone creates a 7b parameter model with only slightly less performance vs. competitive solutions. Then you have many more options - including self-hosting on your own hardware. I don't think Cloudron is the right platform to use in this case solely because all applications are Docker-based and some of the components required to run these modals (CUDA libraries, Torch, etc.) do not play well in a Docker environment. [Note: This was recently shared by an AI engineer during our conversation, but like everything in AI, it is almost obsolete as soon as it is published. So take this advice with a grain of salt and double-check its accuracy, especially over time.]

        1 Reply Last reply
        1
        • L LoudLemur

          Llama 3.1 405b is a recently released, Free software Large Language model which is highly capable and setting the standard on leaderboards. RAG (Retrieval Augmented Generation) is where an AI agent is used by the language model to "go out onto the internet", find the appropriate, current data and then use that to help formulate a response to a query.

          Because Llama 3.1 405b is very large, expensive hardware is needed to generate timely responses to queries.

          Does anybody have suggestions about how to setup and run a llama 3.1 405b model with RAG, perhaps using Cloudron? Where would you host it? If it were for private use and you were not expecting to run queries 24/7, where would you run it?

          https://blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide/

          D Offline
          D Offline
          Dont-Worry
          wrote on last edited by
          #3

          @LoudLemur We planned to buy servers especially for AI for our company uses. This configuration is for entreprise use so i don't know if it's gonna help you but to run Lamma 3.1 405b we planned using something like this :

          62c29757-2cb1-4160-8fa3-8549219ee339-image.png

          With 384gb RAM minimum, maybe more, we have to run tests to see if the models we want to use are using RAM or only GPU's.
          We plan to place the order at BinaryRacks.

          1 Reply Last reply
          1
          • C Offline
            C Offline
            crazybrad
            wrote on last edited by
            #4

            @Dont-Worry Be careful of the GPU RAM (16GB in your case). More GPU RAM will allow larger models to run more efficiently. CPU cores + CPU RAM were much slower (~8x) in our tests with a 7b model vs. running the same load across GPUs.

            1 Reply Last reply
            1
            • humptydumptyH Offline
              humptydumptyH Offline
              humptydumpty
              wrote on last edited by
              #5

              off-topic question, but what do you use these AI for? I use chatgpt for random questions (replacing the need to google things), but I still have no idea how to effectively integrate AI into my professional workflow. What am I missing?

              1 Reply Last reply
              0
              • C Offline
                C Offline
                crazybrad
                wrote on last edited by
                #6

                @humptydumpty Let me suggest a great book to read: Co-Intelligence, by Ethan Mollick. I have read 70% of the book and am coming to the same opinion as the author: "invite AI to your work table all the time". Whether its writing an article, marketing study, presentation or report, it can help create the first draft (edit and polish it yourself). It can generate example code (fix, check and correct it yourself). I find AI helps when you are staring at a blank screen wondering where to start. It can generate many ideas, but it is best when you evaluate and refine them.

                I also found the author presented some great examples of using "prompt engineering" and "personas" to get better results. But there are also subtle reminders in the book about training bias, how AI tries to generate answers that "please us humans", and its inherent limitations and why the same prompt will generate inconsistent responses. Like any tool, it needs to be wielded by a trained craftsman. It can help you get work done faster, but it can also help you get to the wrong answer very quickly.

                1 Reply Last reply
                1
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Bookmarks
                  • Search