Gotenberg Permission Issue and large log volume
-
Hi,
I am having two major issues with the Paperless-ngx app.
The first one deals with Gotenberg. Processing of - in my example -
.eml
and.html
files fails with 500 Internal Server Error.
On closer inspection of the logs, I can see this error relating to permissions:2025-03-26T03:15:17Z [2025-03-26 04:15:17,718] [ERROR] [paperless.consumer] Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z [2025-03-26 04:15:17,728] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z [2025-03-26 04:15:17,824] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[1368eaec-b063-4883-8ed2-1a11d2a46474] raised unexpected: ConsumerError("Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500") 2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z {"level":"error","ts":1742958917.707243,"logger":"api","msg":"convert HTML to PDF: convert to PDF: process first start: start process: run exec allocator: chrome failed to start:\nmkdir: cannot create directory ‘/root’: Permission denied\ntouch: cannot touch '/root/.local/share/applications/mimeapps.list': Permission denied\nchrome_crashpad_handler: --database is required\nTry 'chrome_crashpad_handler --help' for more information.\n[2868:2868:0326/031517.474572:ERROR:socket.cc(120)] recvmsg: Connection reset by peer (104)\n","log_type":"access","trace":"8ba08943-1dec-432b-8fdc-8b375407d48f","remote_ip":"::1","host":"localhost:3000","uri":"/forms/chromium/convert/html","method":"POST","path":"/forms/chromium/convert/html","referer":"","user_agent":"gotenberg-client/0.9.0","status":500,"latency":380718662,"latency_human":"380.718662ms","bytes_in":14690,"bytes_out":21}
It seems the chromium binary wants to write to
/root/.local/share/applications
where it has no permissions?The second issue comes from the fact that the Paperless-ngx app is generating an enormous amount of logs, even under normal conditions.
I am pretty sure that this is why my Paperless-ngx instance is causing extreme CPU load (completely maxed) as well.I'm not able to interpret the cause of the logs, but it looks like something related to Celery Scheduling or Redis is out of whack.
Here are things I have already tried:
- Using the Cloudron Panel, move Paperless-ngx data, from a volume I used before, to the default location in container (no effect)
- Delete /app/code and let container recreate it (no effect)
- Use
cURL
to send a simple test HTML file to Gotenberg to see if it can process it (same 500 Internal Server Error with accompanying logs)
I should note that I recently performed a "backup and restore" as described in your documentation, which has been uneventful with the exception of the above-stated issues with Paperless.
However, I don't know if they are directly related.I have linked the relevant part of the application log, where you can see both the issues with Gotenberg as well as the issue of very high log volume:
https://gist.githubusercontent.com/lukasgabriel/0053a7c0e41a4c32980eb99acf248da9/raw/f288542530c321f925555ef3e5397dd21def33b1/aa8f2c69-31fc-4102-a375-7d175580f6b4.log -
Hi,
I am having two major issues with the Paperless-ngx app.
The first one deals with Gotenberg. Processing of - in my example -
.eml
and.html
files fails with 500 Internal Server Error.
On closer inspection of the logs, I can see this error relating to permissions:2025-03-26T03:15:17Z [2025-03-26 04:15:17,718] [ERROR] [paperless.consumer] Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z [2025-03-26 04:15:17,728] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z [2025-03-26 04:15:17,824] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[1368eaec-b063-4883-8ed2-1a11d2a46474] raised unexpected: ConsumerError("Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500") 2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z documents.consumer.ConsumerError: Example.eml: Error occurred while consuming document Example.eml: Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' 2025-03-26T03:15:17Z {"level":"error","ts":1742958917.707243,"logger":"api","msg":"convert HTML to PDF: convert to PDF: process first start: start process: run exec allocator: chrome failed to start:\nmkdir: cannot create directory ‘/root’: Permission denied\ntouch: cannot touch '/root/.local/share/applications/mimeapps.list': Permission denied\nchrome_crashpad_handler: --database is required\nTry 'chrome_crashpad_handler --help' for more information.\n[2868:2868:0326/031517.474572:ERROR:socket.cc(120)] recvmsg: Connection reset by peer (104)\n","log_type":"access","trace":"8ba08943-1dec-432b-8fdc-8b375407d48f","remote_ip":"::1","host":"localhost:3000","uri":"/forms/chromium/convert/html","method":"POST","path":"/forms/chromium/convert/html","referer":"","user_agent":"gotenberg-client/0.9.0","status":500,"latency":380718662,"latency_human":"380.718662ms","bytes_in":14690,"bytes_out":21}
It seems the chromium binary wants to write to
/root/.local/share/applications
where it has no permissions?The second issue comes from the fact that the Paperless-ngx app is generating an enormous amount of logs, even under normal conditions.
I am pretty sure that this is why my Paperless-ngx instance is causing extreme CPU load (completely maxed) as well.I'm not able to interpret the cause of the logs, but it looks like something related to Celery Scheduling or Redis is out of whack.
Here are things I have already tried:
- Using the Cloudron Panel, move Paperless-ngx data, from a volume I used before, to the default location in container (no effect)
- Delete /app/code and let container recreate it (no effect)
- Use
cURL
to send a simple test HTML file to Gotenberg to see if it can process it (same 500 Internal Server Error with accompanying logs)
I should note that I recently performed a "backup and restore" as described in your documentation, which has been uneventful with the exception of the above-stated issues with Paperless.
However, I don't know if they are directly related.I have linked the relevant part of the application log, where you can see both the issues with Gotenberg as well as the issue of very high log volume:
https://gist.githubusercontent.com/lukasgabriel/0053a7c0e41a4c32980eb99acf248da9/raw/f288542530c321f925555ef3e5397dd21def33b1/aa8f2c69-31fc-4102-a375-7d175580f6b4.log@lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on
Donwnload Full logs
generates a 708MB log file.BTW: the same error on my site (tested with an eml file)
Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
-
I can reproduce the .eml file issue. No solution so far, but will keep this thread update.
For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?
For the logging, do you happen to have enabled debug logging by any chance? Basically enabling
PAPERLESS_DEBUG
in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then. -
Further update on the .eml file handling, when giving proper access to
.local/share/applications/mimeapps.list
it creates that file, although it will remain empty and the .eml processing does not crash anymore but also it just doesn't proceed further without errors.Update: After figuring out how to set debug mode for chromium, the following errors show up, at least something to work with:
17:56:45 - {"level":"debug","ts":1743094605.3818536,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"start unix process: /usr/bin/unoconverter --no-launch --format pdf --port 42231 -vvv --export ExportFormFields=true --export AllowDuplicateFieldNames=false --export ExportBookmarks=true --export ExportBookmarks=true --export ExportBookmarksToPDFDestination=false --export ExportPlaceholders=false --export ExportNotes=false --export ExportNotesPages=false --export ExportOnlyNotesPages=false --export ExportNotesInMargin=false --export ConvertOOoTargetToPDFTarget=false --export ExportLinksRelativeFsys=false --export ExportHiddenSlides=false --export IsSkipEmptyPages=false --export IsAddStream=false --export SinglePageSheets=false --export UseLosslessCompression=false --export Quality=90 --export ReduceImageResolution=false --export MaxImageResolution=300 --export SelectPdfVersion=2 --export PDFUACompliance=false --export UseTaggedPDF=false --export EnableTextAccessForAccessibilityTools=false --output /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/f09408c7-8088-4402-b126-775865290d5e.pdf /tmp/c14f54a1-26d4-41c1-8481-cc7decf80ae7/ce5a2b61-75ee-4e1e-ad9a-993395b81e83/b9e9e260-67db-497c-b6ee-04adbf8c43d6.pdf","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"} 17:56:45 - {"level":"debug","ts":1743094605.410921,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"Traceback (most recent call last):","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"} 17:56:45 - {"level":"debug","ts":1743094605.4109411,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":" File \"/usr/bin/unoconverter\", line 19, in <module>","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"} 17:56:45 - {"level":"debug","ts":1743094605.4110937,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":" from distutils.version import LooseVersion","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"} 17:56:45 - {"level":"debug","ts":1743094605.4110994,"logger":"api.formschromiumconverthtml.usrbinunoconverter.stderr","msg":"ModuleNotFoundError: No module named 'distutils'","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"} 17:56:45 - {"level":"debug","ts":1743094605.4153426,"logger":"api.formschromiumconverthtml.usrbinunoconverter","msg":"unix process already killed","log_type":"application","trace":"8f429e44-92ba-4257-97bd-3413b3dc311e"}
-
@lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on
Donwnload Full logs
generates a 708MB log file.BTW: the same error on my site (tested with an eml file)
Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
@luckow I'm looking at a ~2GB log file across a span of 5 days. But I turned off my Paperless instance for most of that time, so it's more like 2 days of uptime at most.
I don't think the number of documents matters much, unless you're reprocessing them, but I also have around the same number as you.So I'd say I am solidly out of the normal
-
I can reproduce the .eml file issue. No solution so far, but will keep this thread update.
For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?
For the logging, do you happen to have enabled debug logging by any chance? Basically enabling
PAPERLESS_DEBUG
in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.@nebulon I think HTML conversion is supported gotenberg: https://gotenberg.dev/docs/6.x/html - so for sanity-checking if gotenberg can run, it should work.
But Paperless doesn't support it: https://github.com/paperless-ngx/paperless-ngx/discussions/6700
So you can disregard that point of my post -
I see a difference between the Cloudron log file https://my.example.org/logs.html?appId=123456-1234-1234-1234-123455 and the logs in Paperless. (constantly repeated same errors and a lot of visual distraction in the Cloudron logs compared to a quiet log from the app https://paperless.example.org/logs). But at the moment it is not possible for me to investigate further.
-
I can reproduce the .eml file issue. No solution so far, but will keep this thread update.
For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?
For the logging, do you happen to have enabled debug logging by any chance? Basically enabling
PAPERLESS_DEBUG
in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.@nebulon The latest patch related to
unoconv
has resolved the Gotenberg/Tika issues for me! I can now upload EMLs again, and I also tested successfully with XLSX and some others. Great work!The log spam still persists though. @luckow:
paperless.log
is normal for me as well, the problem is thecelery
warnings/errors from the container log, which you get by pressing the "Logs" button in the Cloudron panel.PAPERLESS_DEBUG
isfalse
. -
N nebulon marked this topic as a question
-
N nebulon has marked this topic as solved
-
I managed to solve the problem by deleting
/app/data/data/celerybeat-schedule.db.db
It should have been named/app/data/data/celerybeat-schedule.db
- not sure where it got the extra suffix from, but now the insane logs and CPU usage is fixed and everything runs smoothly again. -
@nebulon The problem with the
.db.db
happened again not even 24h later.
This GitHub Issue suggests it might be a Cloudron problem after all: https://github.com/paperless-ngx/paperless-ngx/discussions/7440#discussioncomment-10324614Any tips? I thought about setting up a cron job to check if the file has the duplicate file extension and rename/remove if it does, but an app package upgrade might remove that job, right? Should I set up the job inside the container, or on the Cloudron host?
-
as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.
-
as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.