Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Gotenberg Permission Issue and large log volume

Gotenberg Permission Issue and large log volume

Scheduled Pinned Locked Moved Solved Paperless-ngx
14 Posts 4 Posters 499 Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • nebulonN Offline
    nebulonN Offline
    nebulon
    Staff
    wrote on last edited by
    #3

    I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

    For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

    For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

    lukasgabrielL 2 Replies Last reply
    1
    • nebulonN Offline
      nebulonN Offline
      nebulon
      Staff
      wrote on last edited by nebulon
      #4

      Further update on the .eml file handling, when giving proper access to .local/share/applications/mimeapps.list it creates that file, although it will remain empty and the .eml processing does not crash anymore but also it just doesn't proceed further without errors.

      Update: After figuring out how to set debug mode for chromium, the following errors show up, at least something to work with:

      17:56:45 - {"level":"debug","ts":1743094605.3818536,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"start unix process: /usr/bin/unoconverter --no-launch --format pdf --port 42231 -vvv --export ExportFormFields=true --export AllowDuplicateFieldNames=false --export ExportBookmarks=true --export ExportBookmarks=true --export ExportBookmarksToPDFDestination=false --export ExportPlaceholders=false --export ExportNotes=false --export ExportNotesPages=false --export ExportOnlyNotesPages=false --export ExportNotesInMargin=false --export ConvertOOoTargetToPDFTarget=false --export ExportLinksRelativeFsys=false --export ExportHiddenSlides=false --export IsSkipEmptyPages=false --export IsAddStream=false --export SinglePageSheets=false --export UseLosslessCompression=false --export Quality=90 --export ReduceImageResolution=false --export MaxImageResolution=300 --export SelectPdfVersion=2 --export PDFUACompliance=false --export UseTaggedPDF=false --export EnableTextAccessForAccessibilityTools=false --output /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/f09408c7-8088-4402-b126-775865290d5e.pdf /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/b9e9e260-67db-497c-b6ee-04adbf8c43d6.pdf","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      17:56:45 - {"level":"debug","ts":1743094605.410921,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"Traceback (most recent call last):","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      17:56:45 - {"level":"debug","ts":1743094605.4109411,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"  File \"/usr/bin/unoconverter\", line 19, in <module>","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      17:56:45 - {"level":"debug","ts":1743094605.4110937,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"    from distutils.version import LooseVersion","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      17:56:45 - {"level":"debug","ts":1743094605.4110994,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"ModuleNotFoundError: No module named 'distutils'","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      17:56:45 - {"level":"debug","ts":1743094605.4153426,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"unix process already killed","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
      
      1 Reply Last reply
      3
      • nebulonN Offline
        nebulonN Offline
        nebulon
        Staff
        wrote on last edited by
        #5

        latest package release has unoconv fixed for the .eml use-case

        1 Reply Last reply
        2
        • luckowL luckow

          @lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on Donwnload Full logs generates a 708MB log file.

          BTW: the same error on my site (tested with an eml file)

          Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
          For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
          
          lukasgabrielL Offline
          lukasgabrielL Offline
          lukasgabriel
          wrote on last edited by lukasgabriel
          #6

          @luckow I'm looking at a ~2GB log file across a span of 5 days. But I turned off my Paperless instance for most of that time, so it's more like 2 days of uptime at most.
          I don't think the number of documents matters much, unless you're reprocessing them, but I also have around the same number as you.

          So I'd say I am solidly out of the normal 😵

          1 Reply Last reply
          0
          • nebulonN nebulon

            I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

            For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

            For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

            lukasgabrielL Offline
            lukasgabrielL Offline
            lukasgabriel
            wrote on last edited by
            #7

            @nebulon I think HTML conversion is supported gotenberg: https://gotenberg.dev/docs/6.x/html - so for sanity-checking if gotenberg can run, it should work.
            But Paperless doesn't support it: https://github.com/paperless-ngx/paperless-ngx/discussions/6700
            So you can disregard that point of my post 🤓

            1 Reply Last reply
            0
            • luckowL Offline
              luckowL Offline
              luckow
              translator
              wrote on last edited by
              #8

              I see a difference between the Cloudron log file https://my.example.org/logs.html?appId=123456-1234-1234-1234-123455 and the logs in Paperless. (constantly repeated same errors and a lot of visual distraction in the Cloudron logs compared to a quiet log from the app https://paperless.example.org/logs). But at the moment it is not possible for me to investigate further.

              Pronouns: he/him | Primary language: German

              1 Reply Last reply
              0
              • nebulonN nebulon

                I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

                For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

                For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

                lukasgabrielL Offline
                lukasgabrielL Offline
                lukasgabriel
                wrote on last edited by
                #9

                @nebulon The latest patch related to unoconv has resolved the Gotenberg/Tika issues for me! I can now upload EMLs again, and I also tested successfully with XLSX and some others. Great work! 😊

                The log spam still persists though. @luckow: paperless.log is normal for me as well, the problem is the celery warnings/errors from the container log, which you get by pressing the "Logs" button in the Cloudron panel. PAPERLESS_DEBUG is false.

                1 Reply Last reply
                0
                • nebulonN Offline
                  nebulonN Offline
                  nebulon
                  Staff
                  wrote on last edited by
                  #10

                  seems like the loggin situation has to be taken to paperless upstream development, maybe you can create an issue there about this.

                  1 Reply Last reply
                  0
                  • nebulonN nebulon marked this topic as a question on
                  • nebulonN nebulon has marked this topic as solved on
                  • lukasgabrielL Offline
                    lukasgabrielL Offline
                    lukasgabriel
                    wrote on last edited by
                    #11

                    I managed to solve the problem by deleting /app/data/data/celerybeat-schedule.db.db
                    It should have been named /app/data/data/celerybeat-schedule.db - not sure where it got the extra suffix from, but now the insane logs and CPU usage is fixed and everything runs smoothly again.

                    1 Reply Last reply
                    1
                    • lukasgabrielL Offline
                      lukasgabrielL Offline
                      lukasgabriel
                      wrote on last edited by
                      #12

                      @nebulon The problem with the .db.db happened again not even 24h later.
                      This GitHub Issue suggests it might be a Cloudron problem after all: https://github.com/paperless-ngx/paperless-ngx/discussions/7440#discussioncomment-10324614

                      Any tips? I thought about setting up a cron job to check if the file has the duplicate file extension and rename/remove if it does, but an app package upgrade might remove that job, right? Should I set up the job inside the container, or on the Cloudron host?

                      1 Reply Last reply
                      0
                      • nebulonN Offline
                        nebulonN Offline
                        nebulon
                        Staff
                        wrote on last edited by
                        #13

                        as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                        robiR 1 Reply Last reply
                        0
                        • nebulonN nebulon

                          as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                          robiR Offline
                          robiR Offline
                          robi
                          wrote on last edited by
                          #14

                          @nebulon perhaps pasting the code into one of the AIs can help spot the problem

                          Conscious tech

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Bookmarks
                          • Search