Error - maybe with the latest update
-
As of today, paperless no longer eats papers. It cries along:
Resource [93mpunkt_tab [0m not found. Please use the NLTK Downloader to obtain the resource: [31m>>> import nltk >>> nltk.download('punkt_tab') [0m For more information see: https://www.nltk.org/data.html Attempted to load [93mtokenizers/punkt_tab/german/ [0m Searched in: - PosixPath('/usr/share/nltk_data')
Maybe a new feature needs some libs?
(https://github.com/paperless-ngx/paperless-ngx/discussions/7538)Package Version
com.paperlessng.cloudronapp@1.25.4
Last Updated
08/23/2024 -
Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
"Ticket" closed. Thank you. -
We have updated the nltk punkt package to use what upstream now uses at https://git.cloudron.io/cloudron/paperless-ngx-app/-/commit/fc677b19cc3f78ff3a7b9b2db13297851a7545ec
Original issue was https://github.com/paperless-ngx/paperless-ngx/issues/7519
Maybe it only fails with specific pdfs. Can you share one which fails, or is just all of the pdfs?
-
15 different pdfs. That's all the pdfs for today.
-
N nebulon marked this topic as a question on
-
We have updated the nltk punkt package to use what upstream now uses at https://git.cloudron.io/cloudron/paperless-ngx-app/-/commit/fc677b19cc3f78ff3a7b9b2db13297851a7545ec
Original issue was https://github.com/paperless-ngx/paperless-ngx/issues/7519
Maybe it only fails with specific pdfs. Can you share one which fails, or is just all of the pdfs?
-
Go to https://www.blindtextgenerator.de/ create some "Webstandards" text (200 characters, default), copy & paste into a libre office document, generate a pdf and upload it.
Logfile says:Aug 26 19:23:47File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/punkt.py", line 1749, in load_lang Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47LookupError: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47R = retval = fun(*args, **kwargs) Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Searched in: Aug 26 19:23:47Searched in: Aug 26 19:23:47Searched in: Aug 26 19:23:47The above exception was the direct cause of the following exception: Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47X = self.data_vectorizer.transform([self.preprocess_content(content)]) Aug 26 19:23:47[2024-08-26 17:23:47,524] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[e6d299f6-9391-4aa2-9827-de6100556bcf] raised unexpected: ConsumerError("German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: \n**********************************************************************\n Resource \x1b[93mpunkt_tab\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt_tab')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt_tab/german/\x1b[0m\n\n Searched in:\n - PosixPath('/usr/share/nltk_data')\n**********************************************************************\n") Aug 26 19:23:47document_consumption_finished.send( Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: Aug 26 19:23:47lang_dir = find(f"tokenizers/punkt_tab/{lang}/") Aug 26 19:23:47msg = plugin.run() Aug 26 19:23:47msg = plugin.run() `` -
ok. it looks like a diff between a fresh install and an update. Waiting for a new update.

-
Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
"Ticket" closed. Thank you. -
L luckow has marked this topic as solved on
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login