Gotenberg Permission Issue and large log volume
-
@lukasgabriel Apart from the error (maybe a package error), let's share some numbers: my paperless has a total of 815 documents. Clicking on
Donwnload Full logs
generates a 708MB log file.BTW: the same error on my site (tested with an eml file)
Error while converting email to PDF: Server error '500 Internal Server Error' for url 'http://localhost:3000/forms/chromium/convert/html' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
@luckow I'm looking at a ~2GB log file across a span of 5 days. But I turned off my Paperless instance for most of that time, so it's more like 2 days of uptime at most.
I don't think the number of documents matters much, unless you're reprocessing them, but I also have around the same number as you.So I'd say I am solidly out of the normal
-
I can reproduce the .eml file issue. No solution so far, but will keep this thread update.
For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?
For the logging, do you happen to have enabled debug logging by any chance? Basically enabling
PAPERLESS_DEBUG
in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.@nebulon I think HTML conversion is supported gotenberg: https://gotenberg.dev/docs/6.x/html - so for sanity-checking if gotenberg can run, it should work.
But Paperless doesn't support it: https://github.com/paperless-ngx/paperless-ngx/discussions/6700
So you can disregard that point of my post -
I see a difference between the Cloudron log file https://my.example.org/logs.html?appId=123456-1234-1234-1234-123455 and the logs in Paperless. (constantly repeated same errors and a lot of visual distraction in the Cloudron logs compared to a quiet log from the app https://paperless.example.org/logs). But at the moment it is not possible for me to investigate further.
-
I can reproduce the .eml file issue. No solution so far, but will keep this thread update.
For the .html file, I get a normal error stating text/html mimetime is not supported. Is this supposed to work in the first place?
For the logging, do you happen to have enabled debug logging by any chance? Basically enabling
PAPERLESS_DEBUG
in the config file? Otherwise it will log everything from INFO on. According to this issue there seems to be no way to change the loglevel though. So maybe you have to ask this again upstream then.@nebulon The latest patch related to
unoconv
has resolved the Gotenberg/Tika issues for me! I can now upload EMLs again, and I also tested successfully with XLSX and some others. Great work!The log spam still persists though. @luckow:
paperless.log
is normal for me as well, the problem is thecelery
warnings/errors from the container log, which you get by pressing the "Logs" button in the Cloudron panel.PAPERLESS_DEBUG
isfalse
. -
N nebulon marked this topic as a question
-
N nebulon has marked this topic as solved
-
I managed to solve the problem by deleting
/app/data/data/celerybeat-schedule.db.db
It should have been named/app/data/data/celerybeat-schedule.db
- not sure where it got the extra suffix from, but now the insane logs and CPU usage is fixed and everything runs smoothly again. -
@nebulon The problem with the
.db.db
happened again not even 24h later.
This GitHub Issue suggests it might be a Cloudron problem after all: https://github.com/paperless-ngx/paperless-ngx/discussions/7440#discussioncomment-10324614Any tips? I thought about setting up a cron job to check if the file has the duplicate file extension and rename/remove if it does, but an app package upgrade might remove that job, right? Should I set up the job inside the container, or on the Cloudron host?
-
as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.
-
as the upstream author indicates, the .db.db is something python related (?) but sadly since we didn't find a way to validate the .db file on startup ourselves, there is little we can do for now to clear that if needed. At least my comments there to even be able to reproduce didn't move it forward. I guess in the end we would have to debug this and hopefully come up with a MR to fix it, but we aren't python developers with little insight into paperless code itself to be honest.