Error - maybe with the latest update
-
As of today, paperless no longer eats papers. It cries along:
Resource [93mpunkt_tab [0m not found. Please use the NLTK Downloader to obtain the resource: [31m>>> import nltk >>> nltk.download('punkt_tab') [0m For more information see: https://www.nltk.org/data.html Attempted to load [93mtokenizers/punkt_tab/german/ [0m Searched in: - PosixPath('/usr/share/nltk_data')
Maybe a new feature needs some libs?
(https://github.com/paperless-ngx/paperless-ngx/discussions/7538)Package Version
com.paperlessng.cloudronapp@1.25.4
Last Updated
08/23/2024 -
Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
"Ticket" closed. Thank you. -
We have updated the nltk punkt package to use what upstream now uses at https://git.cloudron.io/cloudron/paperless-ngx-app/-/commit/fc677b19cc3f78ff3a7b9b2db13297851a7545ec
Original issue was https://github.com/paperless-ngx/paperless-ngx/issues/7519
Maybe it only fails with specific pdfs. Can you share one which fails, or is just all of the pdfs?
-
15 different pdfs. That's all the pdfs for today.
-
-
Go to https://www.blindtextgenerator.de/ create some "Webstandards" text (200 characters, default), copy & paste into a libre office document, generate a pdf and upload it.
Logfile says:Aug 26 19:23:47File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/punkt.py", line 1749, in load_lang Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47For more information see: https://www.nltk.org/data.html Aug 26 19:23:47LookupError: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource: Aug 26 19:23:47R = retval = fun(*args, **kwargs) Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Resource punkt_tab not found. Aug 26 19:23:47Searched in: Aug 26 19:23:47Searched in: Aug 26 19:23:47Searched in: Aug 26 19:23:47The above exception was the direct cause of the following exception: Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47Traceback (most recent call last): Aug 26 19:23:47X = self.data_vectorizer.transform([self.preprocess_content(content)]) Aug 26 19:23:47[2024-08-26 17:23:47,524] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[e6d299f6-9391-4aa2-9827-de6100556bcf] raised unexpected: ConsumerError("German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: \n**********************************************************************\n Resource \x1b[93mpunkt_tab\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt_tab')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt_tab/german/\x1b[0m\n\n Searched in:\n - PosixPath('/usr/share/nltk_data')\n**********************************************************************\n") Aug 26 19:23:47document_consumption_finished.send( Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: Aug 26 19:23:47lang_dir = find(f"tokenizers/punkt_tab/{lang}/") Aug 26 19:23:47msg = plugin.run() Aug 26 19:23:47msg = plugin.run() ``
-
ok. it looks like a diff between a fresh install and an update. Waiting for a new update.
-
Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
"Ticket" closed. Thank you. -