Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Gotenberg Permission Issue and large log volume

Gotenberg Permission Issue and large log volume

Scheduled Pinned Locked Moved Solved Paperless-ngx
14 Posts 4 Posters 498 Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • nebulonN Offline
    nebulonN Offline
    nebulon
    Staff
    wrote on last edited by
    #5

    latest package release has unoconv fixed for the .eml use-case

    1 Reply Last reply
    2
    • luckowL luckow

      @lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on Donwnload Full logs generates a 708MB log file.

      BTW: the same error on my site (tested with an eml file)

      Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
      
      lukasgabrielL Offline
      lukasgabrielL Offline
      lukasgabriel
      wrote on last edited by lukasgabriel
      #6

      @luckow I'm looking at a ~2GB log file across a span of 5 days. But I turned off my Paperless instance for most of that time, so it's more like 2 days of uptime at most.
      I don't think the number of documents matters much, unless you're reprocessing them, but I also have around the same number as you.

      So I'd say I am solidly out of the normal 😡

      1 Reply Last reply
      0
      • nebulonN nebulon

        I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

        For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

        For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

        lukasgabrielL Offline
        lukasgabrielL Offline
        lukasgabriel
        wrote on last edited by
        #7

        @nebulon I think HTML conversion is supported gotenberg: https://gotenberg.dev/docs/6.x/html - so for sanity-checking if gotenberg can run, it should work.
        But Paperless doesn't support it: https://github.com/paperless-ngx/paperless-ngx/discussions/6700
        So you can disregard that point of my post πŸ€“

        1 Reply Last reply
        0
        • luckowL Offline
          luckowL Offline
          luckow
          translator
          wrote on last edited by
          #8

          I see a difference between the Cloudron log file https://my.example.org/logs.html?appId=123456-1234-1234-1234-123455 and the logs in Paperless. (constantly repeated same errors and a lot of visual distraction in the Cloudron logs compared to a quiet log from the app https://paperless.example.org/logs). But at the moment it is not possible for me to investigate further.

          Pronouns: he/him | Primary language: German

          1 Reply Last reply
          0
          • nebulonN nebulon

            I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

            For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

            For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

            lukasgabrielL Offline
            lukasgabrielL Offline
            lukasgabriel
            wrote on last edited by
            #9

            @nebulon The latest patch related to unoconv has resolved the Gotenberg/Tika issues for me! I can now upload EMLs again, and I also tested successfully with XLSX and some others. Great work! 😊

            The log spam still persists though. @luckow: paperless.log is normal for me as well, the problem is the celery warnings/errors from the container log, which you get by pressing the "Logs" button in the Cloudron panel. PAPERLESS_DEBUG is false.

            1 Reply Last reply
            0
            • nebulonN Offline
              nebulonN Offline
              nebulon
              Staff
              wrote on last edited by
              #10

              seems like the loggin situation has to be taken to paperless upstream development, maybe you can create an issue there about this.

              1 Reply Last reply
              0
              • nebulonN nebulon marked this topic as a question on
              • nebulonN nebulon has marked this topic as solved on
              • lukasgabrielL Offline
                lukasgabrielL Offline
                lukasgabriel
                wrote on last edited by
                #11

                I managed to solve the problem by deleting /app/data/data/celerybeat-schedule.db.db
                It should have been named /app/data/data/celerybeat-schedule.db - not sure where it got the extra suffix from, but now the insane logs and CPU usage is fixed and everything runs smoothly again.

                1 Reply Last reply
                1
                • lukasgabrielL Offline
                  lukasgabrielL Offline
                  lukasgabriel
                  wrote on last edited by
                  #12

                  @nebulon The problem with the .db.db happened again not even 24h later.
                  This GitHub Issue suggests it might be a Cloudron problem after all: https://github.com/paperless-ngx/paperless-ngx/discussions/7440#discussioncomment-10324614

                  Any tips? I thought about setting up a cron job to check if the file has the duplicate file extension and rename/remove if it does, but an app package upgrade might remove that job, right? Should I set up the job inside the container, or on the Cloudron host?

                  1 Reply Last reply
                  0
                  • nebulonN Offline
                    nebulonN Offline
                    nebulon
                    Staff
                    wrote on last edited by
                    #13

                    as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                    robiR 1 Reply Last reply
                    0
                    • nebulonN nebulon

                      as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                      robiR Offline
                      robiR Offline
                      robi
                      wrote on last edited by
                      #14

                      @nebulon perhaps pasting the code into one of the AIs can help spot the problem

                      Conscious tech

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Bookmarks
                      • Search