Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Gotenberg Permission Issue and large log volume

Gotenberg Permission Issue and large log volume

Scheduled Pinned Locked Moved Solved Paperless-ngx
14 Posts 4 Posters 498 Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • lukasgabrielL Offline
    lukasgabrielL Offline
    lukasgabriel
    wrote on last edited by lukasgabriel
    #1

    Hi,

    I am having two major issues with the Paperless-ngx app.

    The first one deals with Gotenberg. Processing of - in my example - .eml and .html files fails with 500 Internal Server Error.
    On closer inspection of the logs, I can see this error relating to permissions:

    2025-03-26T03:15:17Z [2025-03-26 04:15:17,718] [ERROR] [paperless.consumer] Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
    
    2025-03-26T03:15:17Z [2025-03-26 04:15:17,728] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
    
    2025-03-26T03:15:17Z [2025-03-26 04:15:17,824] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[1368eaec-b063-4883-8ed2-1a11d2a46474] raised unexpected: ConsumerError("Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500")
    
    2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
    
    2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
    
    2025-03-26T03:15:17Z {"level":"error","ts":1742958917.707243,"logger":"api","msg":"convert HTML to PDF: convert to PDF: process first start: start process: run exec allocator: chrome failed to start:\nmkdir: cannot create directory ‘/root’: Permission denied\ntouch: cannot touch '/root/.local/share/applications/mimeapps.list': Permission denied\nchrome_crashpad_handler: --database is required\nTry 'chrome_crashpad_handler --help' for more information.\n[2868:2868:0326/031517.474572:ERROR:socket.cc(120)] recvmsg: Connection reset by peer (104)\n","log_type":"access","trace":"8ba08943-1dec-432b-8fdc-8b375407d48f","remote_ip":"::1","host":"localhost:3000","uri":"/forms/chromium/convert/html","method":"POST","path":"/forms/chromium/convert/html","referer":"","user_agent":"gotenberg-client/0.9.0","status":500,"latency":380718662,"latency_human":"380.718662ms","bytes_in":14690,"bytes_out":21}
    

    It seems the chromium binary wants to write to /root/.local/share/applications where it has no permissions?

    The second issue comes from the fact that the Paperless-ngx app is generating an enormous amount of logs, even under normal conditions.
    I am pretty sure that this is why my Paperless-ngx instance is causing extreme CPU load (completely maxed) as well.

    I'm not able to interpret the cause of the logs, but it looks like something related to Celery Scheduling or Redis is out of whack.

    Here are things I have already tried:

    • Using the Cloudron Panel, move Paperless-ngx data, from a volume I used before, to the default location in container (no effect)
    • Delete /app/code and let container recreate it (no effect)
    • Use cURL to send a simple test HTML file to Gotenberg to see if it can process it (same 500 Internal Server Error with accompanying logs)

    I should note that I recently performed a "backup and restore" as described in your documentation, which has been uneventful with the exception of the above-stated issues with Paperless.
    However, I don't know if they are directly related.

    I have linked the relevant part of the application log, where you can see both the issues with Gotenberg as well as the issue of very high log volume:
    https://gist.githubusercontent.com/lukasgabriel/0053a7c0e41a4c32980eb99acf248da9/raw/f288542530c321f925555ef3e5397dd21def33b1/aa8f2c69-31fc-4102-a375-7d175580f6b4.log

    luckowL 1 Reply Last reply
    1
    • lukasgabrielL lukasgabriel

      Hi,

      I am having two major issues with the Paperless-ngx app.

      The first one deals with Gotenberg. Processing of - in my example - .eml and .html files fails with 500 Internal Server Error.
      On closer inspection of the logs, I can see this error relating to permissions:

      2025-03-26T03:15:17Z [2025-03-26 04:15:17,718] [ERROR] [paperless.consumer] Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      
      2025-03-26T03:15:17Z [2025-03-26 04:15:17,728] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      
      2025-03-26T03:15:17Z [2025-03-26 04:15:17,824] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[1368eaec-b063-4883-8ed2-1a11d2a46474] raised unexpected: ConsumerError("Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500")
      
      2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      
      2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      
      2025-03-26T03:15:17Z {"level":"error","ts":1742958917.707243,"logger":"api","msg":"convert HTML to PDF: convert to PDF: process first start: start process: run exec allocator: chrome failed to start:\nmkdir: cannot create directory ‘/root’: Permission denied\ntouch: cannot touch '/root/.local/share/applications/mimeapps.list': Permission denied\nchrome_crashpad_handler: --database is required\nTry 'chrome_crashpad_handler --help' for more information.\n[2868:2868:0326/031517.474572:ERROR:socket.cc(120)] recvmsg: Connection reset by peer (104)\n","log_type":"access","trace":"8ba08943-1dec-432b-8fdc-8b375407d48f","remote_ip":"::1","host":"localhost:3000","uri":"/forms/chromium/convert/html","method":"POST","path":"/forms/chromium/convert/html","referer":"","user_agent":"gotenberg-client/0.9.0","status":500,"latency":380718662,"latency_human":"380.718662ms","bytes_in":14690,"bytes_out":21}
      

      It seems the chromium binary wants to write to /root/.local/share/applications where it has no permissions?

      The second issue comes from the fact that the Paperless-ngx app is generating an enormous amount of logs, even under normal conditions.
      I am pretty sure that this is why my Paperless-ngx instance is causing extreme CPU load (completely maxed) as well.

      I'm not able to interpret the cause of the logs, but it looks like something related to Celery Scheduling or Redis is out of whack.

      Here are things I have already tried:

      • Using the Cloudron Panel, move Paperless-ngx data, from a volume I used before, to the default location in container (no effect)
      • Delete /app/code and let container recreate it (no effect)
      • Use cURL to send a simple test HTML file to Gotenberg to see if it can process it (same 500 Internal Server Error with accompanying logs)

      I should note that I recently performed a "backup and restore" as described in your documentation, which has been uneventful with the exception of the above-stated issues with Paperless.
      However, I don't know if they are directly related.

      I have linked the relevant part of the application log, where you can see both the issues with Gotenberg as well as the issue of very high log volume:
      https://gist.githubusercontent.com/lukasgabriel/0053a7c0e41a4c32980eb99acf248da9/raw/f288542530c321f925555ef3e5397dd21def33b1/aa8f2c69-31fc-4102-a375-7d175580f6b4.log

      luckowL Online
      luckowL Online
      luckow
      translator
      wrote on last edited by
      #2

      @lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on Donwnload Full logs generates a 708MB log file.

      BTW: the same error on my site (tested with an eml file)

      Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
      For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
      

      Pronouns: he/him | Primary language: German

      lukasgabrielL 1 Reply Last reply
      1
      • nebulonN Offline
        nebulonN Offline
        nebulon
        Staff
        wrote on last edited by
        #3

        I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

        For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

        For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

        lukasgabrielL 2 Replies Last reply
        1
        • nebulonN Offline
          nebulonN Offline
          nebulon
          Staff
          wrote on last edited by nebulon
          #4

          Further update on the .eml file handling, when giving proper access to .local/share/applications/mimeapps.list it creates that file, although it will remain empty and the .eml processing does not crash anymore but also it just doesn't proceed further without errors.

          Update: After figuring out how to set debug mode for chromium, the following errors show up, at least something to work with:

          17:56:45 - {"level":"debug","ts":1743094605.3818536,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"start unix process: /usr/bin/unoconverter --no-launch --format pdf --port 42231 -vvv --export ExportFormFields=true --export AllowDuplicateFieldNames=false --export ExportBookmarks=true --export ExportBookmarks=true --export ExportBookmarksToPDFDestination=false --export ExportPlaceholders=false --export ExportNotes=false --export ExportNotesPages=false --export ExportOnlyNotesPages=false --export ExportNotesInMargin=false --export ConvertOOoTargetToPDFTarget=false --export ExportLinksRelativeFsys=false --export ExportHiddenSlides=false --export IsSkipEmptyPages=false --export IsAddStream=false --export SinglePageSheets=false --export UseLosslessCompression=false --export Quality=90 --export ReduceImageResolution=false --export MaxImageResolution=300 --export SelectPdfVersion=2 --export PDFUACompliance=false --export UseTaggedPDF=false --export EnableTextAccessForAccessibilityTools=false --output /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/f09408c7-8088-4402-b126-775865290d5e.pdf /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/b9e9e260-67db-497c-b6ee-04adbf8c43d6.pdf","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          17:56:45 - {"level":"debug","ts":1743094605.410921,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"Traceback (most recent call last):","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          17:56:45 - {"level":"debug","ts":1743094605.4109411,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"  File \"/usr/bin/unoconverter\", line 19, in <module>","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          17:56:45 - {"level":"debug","ts":1743094605.4110937,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"    from distutils.version import LooseVersion","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          17:56:45 - {"level":"debug","ts":1743094605.4110994,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"ModuleNotFoundError: No module named 'distutils'","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          17:56:45 - {"level":"debug","ts":1743094605.4153426,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"unix process already killed","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
          
          1 Reply Last reply
          3
          • nebulonN Offline
            nebulonN Offline
            nebulon
            Staff
            wrote on last edited by
            #5

            latest package release has unoconv fixed for the .eml use-case

            1 Reply Last reply
            2
            • luckowL luckow

              @lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on Donwnload Full logs generates a 708MB log file.

              BTW: the same error on my site (tested with an eml file)

              Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'
              For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
              
              lukasgabrielL Offline
              lukasgabrielL Offline
              lukasgabriel
              wrote on last edited by lukasgabriel
              #6

              @luckow I'm looking at a ~2GB log file across a span of 5 days. But I turned off my Paperless instance for most of that time, so it's more like 2 days of uptime at most.
              I don't think the number of documents matters much, unless you're reprocessing them, but I also have around the same number as you.

              So I'd say I am solidly out of the normal 😵

              1 Reply Last reply
              0
              • nebulonN nebulon

                I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

                For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

                For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

                lukasgabrielL Offline
                lukasgabrielL Offline
                lukasgabriel
                wrote on last edited by
                #7

                @nebulon I think HTML conversion is supported gotenberg: https://gotenberg.dev/docs/6.x/html - so for sanity-checking if gotenberg can run, it should work.
                But Paperless doesn't support it: https://github.com/paperless-ngx/paperless-ngx/discussions/6700
                So you can disregard that point of my post 🤓

                1 Reply Last reply
                0
                • luckowL Online
                  luckowL Online
                  luckow
                  translator
                  wrote on last edited by
                  #8

                  I see a difference between the Cloudron log file https://my.example.org/logs.html?appId=123456-1234-1234-1234-123455 and the logs in Paperless. (constantly repeated same errors and a lot of visual distraction in the Cloudron logs compared to a quiet log from the app https://paperless.example.org/logs). But at the moment it is not possible for me to investigate further.

                  Pronouns: he/him | Primary language: German

                  1 Reply Last reply
                  0
                  • nebulonN nebulon

                    I can reproduce the .eml file issue. No solution so far, but will keep this thread update.

                    For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?

                    For the logging, do you happen to have enabled debug logging by any chance? Basically enabling PAPERLESS_DEBUG in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.

                    lukasgabrielL Offline
                    lukasgabrielL Offline
                    lukasgabriel
                    wrote on last edited by
                    #9

                    @nebulon The latest patch related to unoconv has resolved the Gotenberg/Tika issues for me! I can now upload EMLs again, and I also tested successfully with XLSX and some others. Great work! 😊

                    The log spam still persists though. @luckow: paperless.log is normal for me as well, the problem is the celery warnings/errors from the container log, which you get by pressing the "Logs" button in the Cloudron panel. PAPERLESS_DEBUG is false.

                    1 Reply Last reply
                    0
                    • nebulonN Offline
                      nebulonN Offline
                      nebulon
                      Staff
                      wrote on last edited by
                      #10

                      seems like the loggin situation has to be taken to paperless upstream development, maybe you can create an issue there about this.

                      1 Reply Last reply
                      0
                      • nebulonN nebulon marked this topic as a question on
                      • nebulonN nebulon has marked this topic as solved on
                      • lukasgabrielL Offline
                        lukasgabrielL Offline
                        lukasgabriel
                        wrote on last edited by
                        #11

                        I managed to solve the problem by deleting /app/data/data/celerybeat-schedule.db.db
                        It should have been named /app/data/data/celerybeat-schedule.db - not sure where it got the extra suffix from, but now the insane logs and CPU usage is fixed and everything runs smoothly again.

                        1 Reply Last reply
                        1
                        • lukasgabrielL Offline
                          lukasgabrielL Offline
                          lukasgabriel
                          wrote on last edited by
                          #12

                          @nebulon The problem with the .db.db happened again not even 24h later.
                          This GitHub Issue suggests it might be a Cloudron problem after all: https://github.com/paperless-ngx/paperless-ngx/discussions/7440#discussioncomment-10324614

                          Any tips? I thought about setting up a cron job to check if the file has the duplicate file extension and rename/remove if it does, but an app package upgrade might remove that job, right? Should I set up the job inside the container, or on the Cloudron host?

                          1 Reply Last reply
                          0
                          • nebulonN Offline
                            nebulonN Offline
                            nebulon
                            Staff
                            wrote on last edited by
                            #13

                            as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                            robiR 1 Reply Last reply
                            0
                            • nebulonN nebulon

                              as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.

                              robiR Offline
                              robiR Offline
                              robi
                              wrote on last edited by
                              #14

                              @nebulon perhaps pasting the code into one of the AIs can help spot the problem

                              Conscious tech

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • Bookmarks
                              • Search