Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Latest update seems to have similar issue as before, resources not found

Latest update seems to have similar issue as before, resources not found

Scheduled Pinned Locked Moved Solved Paperless-ngx
23 Posts 3 Posters 2.1k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • scookeS scooke

    @scooke I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.

    Logs:

    Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
    Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
    Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
    Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
    Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
    Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
    Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
    Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
    Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
    Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
    Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
    Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Traceback (most recent call last):
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
    Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
    Jan 31 22:32:00 raise LookupError(resource_not_found)
    Jan 31 22:32:00 LookupError:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00
    Jan 31 22:32:00
    Jan 31 22:32:00 During handling of the above exception, another exception occurred:
    Jan 31 22:32:00
    Jan 31 22:32:00 Traceback (most recent call last):
    Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
    Jan 31 22:32:00 document_consumption_finished.send(
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
    Jan 31 22:32:00 return [
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
    Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
    Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
    Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
    Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
    Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
    Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
    Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
    Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
    Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
    Jan 31 22:32:00 self.__load()
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
    Jan 31 22:32:00 raise e
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
    Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
    Jan 31 22:32:00 raise LookupError(resource_not_found)
    Jan 31 22:32:00 LookupError:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00
    Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
    Jan 31 22:32:00 Traceback (most recent call last):
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
    Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
    Jan 31 22:32:00 raise LookupError(resource_not_found)
    Jan 31 22:32:00 LookupError:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00
    Jan 31 22:32:00
    Jan 31 22:32:00 During handling of the above exception, another exception occurred:
    Jan 31 22:32:00
    Jan 31 22:32:00 Traceback (most recent call last):
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
    Jan 31 22:32:00 raise exc_info[1]
    Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
    Jan 31 22:32:00 document_consumption_finished.send(
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
    Jan 31 22:32:00 return [
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
    Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
    Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
    Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
    Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
    Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
    Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
    Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
    Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
    Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
    Jan 31 22:32:00 self.__load()
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
    Jan 31 22:32:00 raise e
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
    Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
    Jan 31 22:32:00 raise LookupError(resource_not_found)
    Jan 31 22:32:00 LookupError:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00
    Jan 31 22:32:00
    Jan 31 22:32:00 The above exception was the direct cause of the following exception:
    Jan 31 22:32:00
    Jan 31 22:32:00 Traceback (most recent call last):
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
    Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
    Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
    Jan 31 22:32:00 return self.run(*args, **kwargs)
    Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
    Jan 31 22:32:00 document = Consumer().try_consume_file(
    Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
    Jan 31 22:32:00 self._fail(
    Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
    Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
    Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00 Resource stopwords not found.
    Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
    Jan 31 22:32:00
    Jan 31 22:32:00 >>> import nltk
    Jan 31 22:32:00 >>> nltk.download('stopwords')
    Jan 31 22:32:00
    Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
    Jan 31 22:32:00
    Jan 31 22:32:00 Attempted to load corpora/stopwords
    Jan 31 22:32:00
    Jan 31 22:32:00 Searched in:
    Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
    Jan 31 22:32:00 **********************************************************************
    Jan 31 22:32:00
    
    girishG Offline
    girishG Offline
    girish
    Staff
    wrote on last edited by
    #11

    @scooke said in Latest update seems to have similar issue as before, resources not found:

    Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/

    So, this is wrong. It should be corpora/stopwords/ , I think.

    root@07a7fcf3-f9ff-4051-9d89-dec6ed4a4777:/usr/local/share/nltk_data/corpora/stopwords# ls
    README       basque   chinese  english  german  hinglish    italian  norwegian   russian  swedish
    arabic       bengali  danish   finnish  greek   hungarian   kazakh   portuguese  slovene  tajik
    azerbaijani  catalan  dutch    french   hebrew  indonesian  nepali   romanian    spanish  turkish
    
    scookeS 1 Reply Last reply
    0
    • scookeS scooke

      @scooke I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.

      Logs:

      Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
      Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
      Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
      Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
      Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
      Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
      Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
      Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
      Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
      Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
      Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
      Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Traceback (most recent call last):
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
      Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
      Jan 31 22:32:00 raise LookupError(resource_not_found)
      Jan 31 22:32:00 LookupError:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00
      Jan 31 22:32:00
      Jan 31 22:32:00 During handling of the above exception, another exception occurred:
      Jan 31 22:32:00
      Jan 31 22:32:00 Traceback (most recent call last):
      Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
      Jan 31 22:32:00 document_consumption_finished.send(
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
      Jan 31 22:32:00 return [
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
      Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
      Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
      Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
      Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
      Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
      Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
      Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
      Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
      Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
      Jan 31 22:32:00 self.__load()
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
      Jan 31 22:32:00 raise e
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
      Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
      Jan 31 22:32:00 raise LookupError(resource_not_found)
      Jan 31 22:32:00 LookupError:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00
      Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
      Jan 31 22:32:00 Traceback (most recent call last):
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
      Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
      Jan 31 22:32:00 raise LookupError(resource_not_found)
      Jan 31 22:32:00 LookupError:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00
      Jan 31 22:32:00
      Jan 31 22:32:00 During handling of the above exception, another exception occurred:
      Jan 31 22:32:00
      Jan 31 22:32:00 Traceback (most recent call last):
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
      Jan 31 22:32:00 raise exc_info[1]
      Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
      Jan 31 22:32:00 document_consumption_finished.send(
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
      Jan 31 22:32:00 return [
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
      Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
      Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
      Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
      Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
      Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
      Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
      Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
      Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
      Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
      Jan 31 22:32:00 self.__load()
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
      Jan 31 22:32:00 raise e
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
      Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
      Jan 31 22:32:00 raise LookupError(resource_not_found)
      Jan 31 22:32:00 LookupError:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00
      Jan 31 22:32:00
      Jan 31 22:32:00 The above exception was the direct cause of the following exception:
      Jan 31 22:32:00
      Jan 31 22:32:00 Traceback (most recent call last):
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
      Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
      Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
      Jan 31 22:32:00 return self.run(*args, **kwargs)
      Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
      Jan 31 22:32:00 document = Consumer().try_consume_file(
      Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
      Jan 31 22:32:00 self._fail(
      Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
      Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
      Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00 Resource stopwords not found.
      Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
      Jan 31 22:32:00
      Jan 31 22:32:00 >>> import nltk
      Jan 31 22:32:00 >>> nltk.download('stopwords')
      Jan 31 22:32:00
      Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
      Jan 31 22:32:00
      Jan 31 22:32:00 Attempted to load corpora/stopwords
      Jan 31 22:32:00
      Jan 31 22:32:00 Searched in:
      Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
      Jan 31 22:32:00 **********************************************************************
      Jan 31 22:32:00
      
      girishG Offline
      girishG Offline
      girish
      Staff
      wrote on last edited by
      #12

      @scooke do you even have /usr/local/share/nltk_data/corpora/stopwords ?

      scookeS 1 Reply Last reply
      0
      • girishG girish

        @scooke said in Latest update seems to have similar issue as before, resources not found:

        Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/

        So, this is wrong. It should be corpora/stopwords/ , I think.

        root@07a7fcf3-f9ff-4051-9d89-dec6ed4a4777:/usr/local/share/nltk_data/corpora/stopwords# ls
        README       basque   chinese  english  german  hinglish    italian  norwegian   russian  swedish
        arabic       bengali  danish   finnish  greek   hungarian   kazakh   portuguese  slovene  tajik
        azerbaijani  catalan  dutch    french   hebrew  indonesian  nepali   romanian    spanish  turkish
        
        scookeS Offline
        scookeS Offline
        scooke
        wrote on last edited by
        #13

        @girish Although there is definitely a stopwords.zip in the corpora/stopwords directory, there is also a stopwords folder (unzipped, I suppose). That is one thing to correct then.

        I found this link: https://aur.archlinux.org/packages/paperless-ngx. Six comments from the bottom ammo wrote that executing ln -s /usr/share/nltk_data /usr/local/share/nltk_data fixed it for him. I tried it but, of course, these are read-only directories. So maybe this is a second thing to fix?

        A life lived in fear is a life half-lived

        1 Reply Last reply
        0
        • girishG girish

          @scooke do you even have /usr/local/share/nltk_data/corpora/stopwords ?

          scookeS Offline
          scookeS Offline
          scooke
          wrote on last edited by
          #14

          @girish said in Latest update seems to have similar issue as before, resources not found:

          /usr/local/share/nltk_data/corpora/stopwords

          Yep

          root@mypaperlessimagelongnumbernamethingy:/usr/local/share/nltk_data/corpora/stopwords# ls -al
          total 160
          drwxr-xr-x 2 root root  4096 Jan 26 09:05 .
          drwxr-xr-x 3 root root  4096 Jan 26 09:05 ..
          -rw-r--r-- 1 root root   909 Jan 26 09:05 README
          -rw-r--r-- 1 root root  6348 Jan 26 09:05 arabic
          -rw-r--r-- 1 root root   967 Jan 26 09:05 azerbaijani
          -rw-r--r-- 1 root root  2202 Jan 26 09:05 basque
          -rw-r--r-- 1 root root  5443 Jan 26 09:05 bengali
          -rw-r--r-- 1 root root  1558 Jan 26 09:05 catalan
          -rw-r--r-- 1 root root  5560 Jan 26 09:05 chinese
          -rw-r--r-- 1 root root   424 Jan 26 09:05 danish
          -rw-r--r-- 1 root root   453 Jan 26 09:05 dutch
          -rw-r--r-- 1 root root   936 Jan 26 09:05 english
          -rw-r--r-- 1 root root  1579 Jan 26 09:05 finnish
          -rw-r--r-- 1 root root   813 Jan 26 09:05 french
          -rw-r--r-- 1 root root  1362 Jan 26 09:05 german
          -rw-r--r-- 1 root root  2167 Jan 26 09:05 greek
          -rw-r--r-- 1 root root  1836 Jan 26 09:05 hebrew
          -rw-r--r-- 1 root root  5958 Jan 26 09:05 hinglish
          -rw-r--r-- 1 root root  1227 Jan 26 09:05 hungarian
          -rw-r--r-- 1 root root  6446 Jan 26 09:05 indonesian
          -rw-r--r-- 1 root root  1654 Jan 26 09:05 italian
          -rw-r--r-- 1 root root  3880 Jan 26 09:05 kazakh
          -rw-r--r-- 1 root root  3610 Jan 26 09:05 nepali
          -rw-r--r-- 1 root root   851 Jan 26 09:05 norwegian
          -rw-r--r-- 1 root root  1286 Jan 26 09:05 portuguese
          -rw-r--r-- 1 root root  1910 Jan 26 09:05 romanian
          -rw-r--r-- 1 root root  1235 Jan 26 09:05 russian
          -rw-r--r-- 1 root root 15980 Jan 26 09:05 slovene
          -rw-r--r-- 1 root root  2176 Jan 26 09:05 spanish
          -rw-r--r-- 1 root root   559 Jan 26 09:05 swedish
          -rw-r--r-- 1 root root  1818 Jan 26 09:05 tajik
          -rw-r--r-- 1 root root   260 Jan 26 09:05 turkish
          

          A life lived in fear is a life half-lived

          1 Reply Last reply
          0
          • scookeS Offline
            scookeS Offline
            scooke
            wrote on last edited by
            #15

            Thank you for troubleshooting this with me. Weird that I'm the only one!

            A life lived in fear is a life half-lived

            girishG 2 Replies Last reply
            0
            • scookeS scooke

              Thank you for troubleshooting this with me. Weird that I'm the only one!

              girishG Offline
              girishG Offline
              girish
              Staff
              wrote on last edited by
              #16

              @scooke I think I am not testing this correctly because as per the code atleast it should fail for me but it doesn't.

              Should I enable some flag inside paperless for nltk handling ? There is a complete absence of nltk in my logs.

              1 Reply Last reply
              0
              • scookeS scooke

                Thank you for troubleshooting this with me. Weird that I'm the only one!

                girishG Offline
                girishG Offline
                girish
                Staff
                wrote on last edited by
                #17

                @scooke Can you try with 1.5.4 please?

                I think I found the issue. First, the classifier needs to be created with document_create_classifier . One has to also put some tags, categories etc for documents. Once all that is setup, the nltk stuff kicks in. Without the classifier, nltk is just skipped.

                scookeS 1 Reply Last reply
                0
                • girishG girish

                  @scooke Can you try with 1.5.4 please?

                  I think I found the issue. First, the classifier needs to be created with document_create_classifier . One has to also put some tags, categories etc for documents. Once all that is setup, the nltk stuff kicks in. Without the classifier, nltk is just skipped.

                  scookeS Offline
                  scookeS Offline
                  scooke
                  wrote on last edited by
                  #18

                  @girish I"m not seeing v.1.5.4... is there a way to force it?

                  A life lived in fear is a life half-lived

                  girishG 1 Reply Last reply
                  0
                  • A Offline
                    A Offline
                    axel0681
                    wrote on last edited by
                    #19

                    Latest Update 1.5.4 is working for me .

                    Thanks

                    1 Reply Last reply
                    1
                    • scookeS scooke

                      @girish I"m not seeing v.1.5.4... is there a way to force it?

                      girishG Offline
                      girishG Offline
                      girish
                      Staff
                      wrote on last edited by
                      #20

                      @scooke said in Latest update seems to have similar issue as before, resources not found:

                      @girish I"m not seeing v.1.5.4... is there a way to force it?

                      Strange, it's published normally. Did you check for updates manually?

                      scookeS 1 Reply Last reply
                      0
                      • girishG girish

                        @scooke said in Latest update seems to have similar issue as before, resources not found:

                        @girish I"m not seeing v.1.5.4... is there a way to force it?

                        Strange, it's published normally. Did you check for updates manually?

                        scookeS Offline
                        scookeS Offline
                        scooke
                        wrote on last edited by
                        #21

                        @girish Do I need to be upgraded to v1.5.2 to see it? Or do I need to have auto-updates turned on. I've turned them off and am checking manually as I'd rather have an older version that works than a newer one that doesn't.

                        A life lived in fear is a life half-lived

                        girishG 1 Reply Last reply
                        0
                        • scookeS scooke

                          @girish Do I need to be upgraded to v1.5.2 to see it? Or do I need to have auto-updates turned on. I've turned them off and am checking manually as I'd rather have an older version that works than a newer one that doesn't.

                          girishG Offline
                          girishG Offline
                          girish
                          Staff
                          wrote on last edited by
                          #22

                          @scooke ah, that's correct. you have to update from 1.5.1 to 1.5.2 to 1.5.3 .

                          scookeS 1 Reply Last reply
                          0
                          • girishG girish

                            @scooke ah, that's correct. you have to update from 1.5.1 to 1.5.2 to 1.5.3 .

                            scookeS Offline
                            scookeS Offline
                            scooke
                            wrote on last edited by
                            #23

                            @girish I'm happy to report that once upgraded to v1.5.4 that the PDF upload and processing completes without any problems. Whatever you did worked! Thank you!

                            A life lived in fear is a life half-lived

                            1 Reply Last reply
                            1
                            • girishG girish has marked this topic as solved on
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • Bookmarks
                            • Search