Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Latest update seems to have similar issue as before, resources not found

Latest update seems to have similar issue as before, resources not found

Scheduled Pinned Locked Moved Solved Paperless-ngx
23 Posts 3 Posters 1.8k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scookeS Offline
      scookeS Offline
      scooke
      wrote on last edited by
      #1

      Package v.1.5.2 has a problem. Fortunately I had a working backup for v.1.5.1.

      Please use the NLTK Downloader to obtain the resource:
      
        >>> import nltk
        >>> nltk.download('stopwords')
      

      PDFs would upload but wouldn't get processed.

      A life lived in fear is a life half-lived

      girishG 2 Replies Last reply
      0
      • nebulonN nebulon marked this topic as a question on
      • scookeS scooke

        Package v.1.5.2 has a problem. Fortunately I had a working backup for v.1.5.1.

        Please use the NLTK Downloader to obtain the resource:
        
          >>> import nltk
          >>> nltk.download('stopwords')
        

        PDFs would upload but wouldn't get processed.

        girishG Offline
        girishG Offline
        girish
        Staff
        wrote on last edited by
        #2

        @scooke thanks for reporting, think I can reproduce this.

        1 Reply Last reply
        1
        • scookeS scooke

          Package v.1.5.2 has a problem. Fortunately I had a working backup for v.1.5.1.

          Please use the NLTK Downloader to obtain the resource:
          
            >>> import nltk
            >>> nltk.download('stopwords')
          

          PDFs would upload but wouldn't get processed.

          girishG Offline
          girishG Offline
          girish
          Staff
          wrote on last edited by
          #3

          @scooke actually, I can't reproduce this. Where do you see that message "Please use the NLTK Downloader to obtain the resource:" ?

          NTLK data is already included in the image now since package 1.5.0.

          scookeS 1 Reply Last reply
          0
          • girishG girish

            @scooke actually, I can't reproduce this. Where do you see that message "Please use the NLTK Downloader to obtain the resource:" ?

            NTLK data is already included in the image now since package 1.5.0.

            scookeS Offline
            scookeS Offline
            scooke
            wrote on last edited by
            #4

            @girish It had worked when I restored to v1.5.1, but just today the problem returned.

            In the actual web apps Files dashboard, under the "Failed" tab, is this error:

            Drop Everything and Read—but How_.pdf: The following error occurred while consuming Drop Everything and Read—but How_.pdf:

            **********************************************************************
              Resource stopwords not found.
              Please use the NLTK Downloader to obtain the resource:
            
              >>> import nltk
              >>> nltk.download('stopwords')
              
              For more information see: https://www.nltk.org/data.html
            
              Attempted to load corpora/stopwords
            
              Searched in:
                - '/usr/share/nltk_data'
            **********************************************************************
            

            Here are logs from the Cloudron app dashboard:

            Jan 31 19:27:48 Searched in:
            Jan 31 19:27:48 - '/usr/share/nltk_data'
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48
            Jan 31 19:27:48 [2023-01-31 18:27:48,579] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[fa64d1a1-7726-4390-ad33-1094d0600517] raised unexpected: ConsumerError("Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/share/nltk_data'\n**********************************************************************\n")
            Jan 31 19:27:48 Traceback (most recent call last):
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
            Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{zip_name}")
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
            Jan 31 19:27:48 raise LookupError(resource_not_found)
            Jan 31 19:27:48 LookupError:
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48 Resource stopwords not found.
            Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
            Jan 31 19:27:48
            Jan 31 19:27:48 >>> import nltk
            Jan 31 19:27:48 >>> nltk.download('stopwords')
            Jan 31 19:27:48
            Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
            Jan 31 19:27:48
            Jan 31 19:27:48 Attempted to load corpora/stopwords.zip/stopwords/
            Jan 31 19:27:48
            Jan 31 19:27:48 Searched in:
            Jan 31 19:27:48 - '/usr/share/nltk_data'
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48
            Jan 31 19:27:48
            Jan 31 19:27:48 During handling of the above exception, another exception occurred:
            Jan 31 19:27:48
            Jan 31 19:27:48 Traceback (most recent call last):
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
            Jan 31 19:27:48 raise exc_info[1]
            Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
            Jan 31 19:27:48 document_consumption_finished.send(
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
            Jan 31 19:27:48 return [
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
            Jan 31 19:27:48 (receiver, receiver(signal=self, sender=sender, **named))
            Jan 31 19:27:48 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
            Jan 31 19:27:48 matched_tags = matching.match_tags(document, classifier)
            Jan 31 19:27:48 File "/app/code/src/documents/matching.py", line 50, in match_tags
            Jan 31 19:27:48 predicted_tag_ids = classifier.predict_tags(document.content)
            Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
            Jan 31 19:27:48 X = self.data_vectorizer.transform([self.preprocess_content(content)])
            Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
            Jan 31 19:27:48 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
            Jan 31 19:27:48 self.__load()
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
            Jan 31 19:27:48 raise e
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
            Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{self.__name}")
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
            Jan 31 19:27:48 raise LookupError(resource_not_found)
            Jan 31 19:27:48 LookupError:
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48 Resource stopwords not found.
            Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
            Jan 31 19:27:48
            Jan 31 19:27:48 >>> import nltk
            Jan 31 19:27:48 >>> nltk.download('stopwords')
            Jan 31 19:27:48
            Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
            Jan 31 19:27:48
            Jan 31 19:27:48 Attempted to load corpora/stopwords
            Jan 31 19:27:48
            Jan 31 19:27:48 Searched in:
            Jan 31 19:27:48 - '/usr/share/nltk_data'
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48
            Jan 31 19:27:48
            Jan 31 19:27:48 The above exception was the direct cause of the following exception:
            Jan 31 19:27:48
            Jan 31 19:27:48 Traceback (most recent call last):
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
            Jan 31 19:27:48 R = retval = fun(*args, **kwargs)
            Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
            Jan 31 19:27:48 return self.run(*args, **kwargs)
            Jan 31 19:27:48 File "/app/code/src/documents/tasks.py", line 192, in consume_file
            Jan 31 19:27:48 document = Consumer().try_consume_file(
            Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
            Jan 31 19:27:48 self._fail(
            Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 93, in _fail
            Jan 31 19:27:48 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
            Jan 31 19:27:48 documents.consumer.ConsumerError: Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf:
            Jan 31 19:27:48 **********************************************************************
            Jan 31 19:27:48 Resource stopwords not found.
            Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
            Jan 31 19:27:48
            Jan 31 19:27:48 >>> import nltk
            Jan 31 19:27:48 >>> nltk.download('stopwords')
            Jan 31 19:27:48
            Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
            Jan 31 19:27:48
            Jan 31 19:27:48 Attempted to load corpora/stopwords
            Jan 31 19:27:48
            Jan 31 19:27:48 Searched in:
            Jan 31 19:27:48 - '/usr/share/nltk_data'
            Jan 31 19:27:48 **********************************************************************
            

            A life lived in fear is a life half-lived

            scookeS 1 Reply Last reply
            0
            • scookeS scooke

              @girish It had worked when I restored to v1.5.1, but just today the problem returned.

              In the actual web apps Files dashboard, under the "Failed" tab, is this error:

              Drop Everything and Read—but How_.pdf: The following error occurred while consuming Drop Everything and Read—but How_.pdf:

              **********************************************************************
                Resource stopwords not found.
                Please use the NLTK Downloader to obtain the resource:
              
                >>> import nltk
                >>> nltk.download('stopwords')
                
                For more information see: https://www.nltk.org/data.html
              
                Attempted to load corpora/stopwords
              
                Searched in:
                  - '/usr/share/nltk_data'
              **********************************************************************
              

              Here are logs from the Cloudron app dashboard:

              Jan 31 19:27:48 Searched in:
              Jan 31 19:27:48 - '/usr/share/nltk_data'
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48
              Jan 31 19:27:48 [2023-01-31 18:27:48,579] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[fa64d1a1-7726-4390-ad33-1094d0600517] raised unexpected: ConsumerError("Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/share/nltk_data'\n**********************************************************************\n")
              Jan 31 19:27:48 Traceback (most recent call last):
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
              Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{zip_name}")
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
              Jan 31 19:27:48 raise LookupError(resource_not_found)
              Jan 31 19:27:48 LookupError:
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48 Resource stopwords not found.
              Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
              Jan 31 19:27:48
              Jan 31 19:27:48 >>> import nltk
              Jan 31 19:27:48 >>> nltk.download('stopwords')
              Jan 31 19:27:48
              Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
              Jan 31 19:27:48
              Jan 31 19:27:48 Attempted to load corpora/stopwords.zip/stopwords/
              Jan 31 19:27:48
              Jan 31 19:27:48 Searched in:
              Jan 31 19:27:48 - '/usr/share/nltk_data'
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48
              Jan 31 19:27:48
              Jan 31 19:27:48 During handling of the above exception, another exception occurred:
              Jan 31 19:27:48
              Jan 31 19:27:48 Traceback (most recent call last):
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
              Jan 31 19:27:48 raise exc_info[1]
              Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
              Jan 31 19:27:48 document_consumption_finished.send(
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
              Jan 31 19:27:48 return [
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
              Jan 31 19:27:48 (receiver, receiver(signal=self, sender=sender, **named))
              Jan 31 19:27:48 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
              Jan 31 19:27:48 matched_tags = matching.match_tags(document, classifier)
              Jan 31 19:27:48 File "/app/code/src/documents/matching.py", line 50, in match_tags
              Jan 31 19:27:48 predicted_tag_ids = classifier.predict_tags(document.content)
              Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
              Jan 31 19:27:48 X = self.data_vectorizer.transform([self.preprocess_content(content)])
              Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
              Jan 31 19:27:48 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
              Jan 31 19:27:48 self.__load()
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
              Jan 31 19:27:48 raise e
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
              Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{self.__name}")
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
              Jan 31 19:27:48 raise LookupError(resource_not_found)
              Jan 31 19:27:48 LookupError:
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48 Resource stopwords not found.
              Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
              Jan 31 19:27:48
              Jan 31 19:27:48 >>> import nltk
              Jan 31 19:27:48 >>> nltk.download('stopwords')
              Jan 31 19:27:48
              Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
              Jan 31 19:27:48
              Jan 31 19:27:48 Attempted to load corpora/stopwords
              Jan 31 19:27:48
              Jan 31 19:27:48 Searched in:
              Jan 31 19:27:48 - '/usr/share/nltk_data'
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48
              Jan 31 19:27:48
              Jan 31 19:27:48 The above exception was the direct cause of the following exception:
              Jan 31 19:27:48
              Jan 31 19:27:48 Traceback (most recent call last):
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
              Jan 31 19:27:48 R = retval = fun(*args, **kwargs)
              Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
              Jan 31 19:27:48 return self.run(*args, **kwargs)
              Jan 31 19:27:48 File "/app/code/src/documents/tasks.py", line 192, in consume_file
              Jan 31 19:27:48 document = Consumer().try_consume_file(
              Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
              Jan 31 19:27:48 self._fail(
              Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 93, in _fail
              Jan 31 19:27:48 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
              Jan 31 19:27:48 documents.consumer.ConsumerError: Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf:
              Jan 31 19:27:48 **********************************************************************
              Jan 31 19:27:48 Resource stopwords not found.
              Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
              Jan 31 19:27:48
              Jan 31 19:27:48 >>> import nltk
              Jan 31 19:27:48 >>> nltk.download('stopwords')
              Jan 31 19:27:48
              Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
              Jan 31 19:27:48
              Jan 31 19:27:48 Attempted to load corpora/stopwords
              Jan 31 19:27:48
              Jan 31 19:27:48 Searched in:
              Jan 31 19:27:48 - '/usr/share/nltk_data'
              Jan 31 19:27:48 **********************************************************************
              
              scookeS Offline
              scookeS Offline
              scooke
              wrote on last edited by scooke
              #5

              @scooke Ah, the app updated to v1.5.2 overnight. I have restored once again to v1.5.1 and it works fine now. I've turned off Auto-update for now.

              A life lived in fear is a life half-lived

              1 Reply Last reply
              0
              • A Offline
                A Offline
                axel0681
                wrote on last edited by
                #6

                Hello,

                Same Problem on our sites.
                Package Versioncom.paperlessng.cloudronapp@1.5.2

                The last step "saving the Dokument failed"

                Thx
                Axel

                abc.pdf: The following error occurred while consuming 
                abc.pdf: 
                **********************************************************************
                  Resource stopwords not found.
                  Please use the NLTK Downloader to obtain the resource:
                
                  >>> import nltk
                  >>> nltk.download('stopwords')
                  
                  For more information see: https://www.nltk.org/data.html
                
                  Attempted to load corpora/stopwords
                
                  Searched in:
                    - '/usr/share/nltk_data'
                **********************************************************************
                
                girishG 2 Replies Last reply
                0
                • A axel0681

                  Hello,

                  Same Problem on our sites.
                  Package Versioncom.paperlessng.cloudronapp@1.5.2

                  The last step "saving the Dokument failed"

                  Thx
                  Axel

                  abc.pdf: The following error occurred while consuming 
                  abc.pdf: 
                  **********************************************************************
                    Resource stopwords not found.
                    Please use the NLTK Downloader to obtain the resource:
                  
                    >>> import nltk
                    >>> nltk.download('stopwords')
                    
                    For more information see: https://www.nltk.org/data.html
                  
                    Attempted to load corpora/stopwords
                  
                    Searched in:
                      - '/usr/share/nltk_data'
                  **********************************************************************
                  
                  girishG Offline
                  girishG Offline
                  girish
                  Staff
                  wrote on last edited by
                  #7

                  @axel0681 can you check the contents of /usr/local/share/nltk_data ?

                  1 Reply Last reply
                  0
                  • A axel0681

                    Hello,

                    Same Problem on our sites.
                    Package Versioncom.paperlessng.cloudronapp@1.5.2

                    The last step "saving the Dokument failed"

                    Thx
                    Axel

                    abc.pdf: The following error occurred while consuming 
                    abc.pdf: 
                    **********************************************************************
                      Resource stopwords not found.
                      Please use the NLTK Downloader to obtain the resource:
                    
                      >>> import nltk
                      >>> nltk.download('stopwords')
                      
                      For more information see: https://www.nltk.org/data.html
                    
                      Attempted to load corpora/stopwords
                    
                      Searched in:
                        - '/usr/share/nltk_data'
                    **********************************************************************
                    
                    girishG Offline
                    girishG Offline
                    girish
                    Staff
                    wrote on last edited by
                    #8

                    @axel0681 Actually, can you set PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data in paperless.conf and restart the app ?

                    scookeS 1 Reply Last reply
                    0
                    • girishG girish

                      @axel0681 Actually, can you set PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data in paperless.conf and restart the app ?

                      scookeS Offline
                      scookeS Offline
                      scooke
                      wrote on last edited by
                      #9

                      @girish Weird. Adding that to the conf, while still on v1.5.1, causes it to fail again.

                      The contents of that directory are: corpora, stemmers, tokenizers

                      Despite the error, which reads thusly:

                      ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords')  For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords Searched in: - '/usr/local/share/ntlk_data' **********************************************************************
                      

                      there is /usr/local/share/ntlk_data/corpora/stopwords, which includes a stopwords.zip file.

                      I noticed that in the conf file, the env PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data has the NLTK highlighted in brown, which must mean that the syntax is off, right?

                      Finally, I also noticed at the bottom of the env file that the following are all commented out:

                      # Binaries
                      
                      #PAPERLESS_CONVERT_BINARY=/usr/bin/convert
                      #PAPERLESS_GS_BINARY=/usr/bin/gs
                      #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
                      

                      Commenting out the line you suggested I add (running v1.5.1) sees the app functioning properly.

                      I guess I will now try to update again, then add that line to the conf file and see what happens.

                      A life lived in fear is a life half-lived

                      scookeS 1 Reply Last reply
                      0
                      • scookeS scooke

                        @girish Weird. Adding that to the conf, while still on v1.5.1, causes it to fail again.

                        The contents of that directory are: corpora, stemmers, tokenizers

                        Despite the error, which reads thusly:

                        ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords')  For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords Searched in: - '/usr/local/share/ntlk_data' **********************************************************************
                        

                        there is /usr/local/share/ntlk_data/corpora/stopwords, which includes a stopwords.zip file.

                        I noticed that in the conf file, the env PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data has the NLTK highlighted in brown, which must mean that the syntax is off, right?

                        Finally, I also noticed at the bottom of the env file that the following are all commented out:

                        # Binaries
                        
                        #PAPERLESS_CONVERT_BINARY=/usr/bin/convert
                        #PAPERLESS_GS_BINARY=/usr/bin/gs
                        #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
                        

                        Commenting out the line you suggested I add (running v1.5.1) sees the app functioning properly.

                        I guess I will now try to update again, then add that line to the conf file and see what happens.

                        scookeS Offline
                        scookeS Offline
                        scooke
                        wrote on last edited by
                        #10

                        @scooke I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.

                        Logs:

                        Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
                        Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
                        Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
                        Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
                        Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
                        Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
                        Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
                        Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
                        Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
                        Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
                        Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
                        Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Traceback (most recent call last):
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                        Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                        Jan 31 22:32:00 raise LookupError(resource_not_found)
                        Jan 31 22:32:00 LookupError:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00
                        Jan 31 22:32:00
                        Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Traceback (most recent call last):
                        Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                        Jan 31 22:32:00 document_consumption_finished.send(
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                        Jan 31 22:32:00 return [
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                        Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                        Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                        Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                        Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                        Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                        Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                        Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                        Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                        Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                        Jan 31 22:32:00 self.__load()
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                        Jan 31 22:32:00 raise e
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                        Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                        Jan 31 22:32:00 raise LookupError(resource_not_found)
                        Jan 31 22:32:00 LookupError:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00
                        Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
                        Jan 31 22:32:00 Traceback (most recent call last):
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                        Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                        Jan 31 22:32:00 raise LookupError(resource_not_found)
                        Jan 31 22:32:00 LookupError:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00
                        Jan 31 22:32:00
                        Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Traceback (most recent call last):
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
                        Jan 31 22:32:00 raise exc_info[1]
                        Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                        Jan 31 22:32:00 document_consumption_finished.send(
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                        Jan 31 22:32:00 return [
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                        Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                        Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                        Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                        Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                        Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                        Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                        Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                        Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                        Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                        Jan 31 22:32:00 self.__load()
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                        Jan 31 22:32:00 raise e
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                        Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                        Jan 31 22:32:00 raise LookupError(resource_not_found)
                        Jan 31 22:32:00 LookupError:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00
                        Jan 31 22:32:00
                        Jan 31 22:32:00 The above exception was the direct cause of the following exception:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Traceback (most recent call last):
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
                        Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
                        Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
                        Jan 31 22:32:00 return self.run(*args, **kwargs)
                        Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
                        Jan 31 22:32:00 document = Consumer().try_consume_file(
                        Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
                        Jan 31 22:32:00 self._fail(
                        Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
                        Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
                        Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00 Resource stopwords not found.
                        Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                        Jan 31 22:32:00
                        Jan 31 22:32:00 >>> import nltk
                        Jan 31 22:32:00 >>> nltk.download('stopwords')
                        Jan 31 22:32:00
                        Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Attempted to load corpora/stopwords
                        Jan 31 22:32:00
                        Jan 31 22:32:00 Searched in:
                        Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                        Jan 31 22:32:00 **********************************************************************
                        Jan 31 22:32:00
                        

                        A life lived in fear is a life half-lived

                        girishG 2 Replies Last reply
                        0
                        • scookeS scooke

                          @scooke I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.

                          Logs:

                          Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
                          Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
                          Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
                          Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
                          Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
                          Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
                          Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
                          Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
                          Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
                          Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
                          Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
                          Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Traceback (most recent call last):
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                          Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                          Jan 31 22:32:00 raise LookupError(resource_not_found)
                          Jan 31 22:32:00 LookupError:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00
                          Jan 31 22:32:00
                          Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Traceback (most recent call last):
                          Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                          Jan 31 22:32:00 document_consumption_finished.send(
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                          Jan 31 22:32:00 return [
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                          Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                          Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                          Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                          Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                          Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                          Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                          Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                          Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                          Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                          Jan 31 22:32:00 self.__load()
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                          Jan 31 22:32:00 raise e
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                          Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                          Jan 31 22:32:00 raise LookupError(resource_not_found)
                          Jan 31 22:32:00 LookupError:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00
                          Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
                          Jan 31 22:32:00 Traceback (most recent call last):
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                          Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                          Jan 31 22:32:00 raise LookupError(resource_not_found)
                          Jan 31 22:32:00 LookupError:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00
                          Jan 31 22:32:00
                          Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Traceback (most recent call last):
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
                          Jan 31 22:32:00 raise exc_info[1]
                          Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                          Jan 31 22:32:00 document_consumption_finished.send(
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                          Jan 31 22:32:00 return [
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                          Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                          Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                          Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                          Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                          Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                          Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                          Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                          Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                          Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                          Jan 31 22:32:00 self.__load()
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                          Jan 31 22:32:00 raise e
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                          Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                          Jan 31 22:32:00 raise LookupError(resource_not_found)
                          Jan 31 22:32:00 LookupError:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00
                          Jan 31 22:32:00
                          Jan 31 22:32:00 The above exception was the direct cause of the following exception:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Traceback (most recent call last):
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
                          Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
                          Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
                          Jan 31 22:32:00 return self.run(*args, **kwargs)
                          Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
                          Jan 31 22:32:00 document = Consumer().try_consume_file(
                          Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
                          Jan 31 22:32:00 self._fail(
                          Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
                          Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
                          Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00 Resource stopwords not found.
                          Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                          Jan 31 22:32:00
                          Jan 31 22:32:00 >>> import nltk
                          Jan 31 22:32:00 >>> nltk.download('stopwords')
                          Jan 31 22:32:00
                          Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Attempted to load corpora/stopwords
                          Jan 31 22:32:00
                          Jan 31 22:32:00 Searched in:
                          Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                          Jan 31 22:32:00 **********************************************************************
                          Jan 31 22:32:00
                          
                          girishG Offline
                          girishG Offline
                          girish
                          Staff
                          wrote on last edited by
                          #11

                          @scooke said in Latest update seems to have similar issue as before, resources not found:

                          Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/

                          So, this is wrong. It should be corpora/stopwords/ , I think.

                          root@07a7fcf3-f9ff-4051-9d89-dec6ed4a4777:/usr/local/share/nltk_data/corpora/stopwords# ls
                          README       basque   chinese  english  german  hinglish    italian  norwegian   russian  swedish
                          arabic       bengali  danish   finnish  greek   hungarian   kazakh   portuguese  slovene  tajik
                          azerbaijani  catalan  dutch    french   hebrew  indonesian  nepali   romanian    spanish  turkish
                          
                          scookeS 1 Reply Last reply
                          0
                          • scookeS scooke

                            @scooke I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.

                            Logs:

                            Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
                            Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
                            Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
                            Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
                            Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
                            Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
                            Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
                            Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
                            Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
                            Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
                            Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
                            Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Traceback (most recent call last):
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                            Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                            Jan 31 22:32:00 raise LookupError(resource_not_found)
                            Jan 31 22:32:00 LookupError:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00
                            Jan 31 22:32:00
                            Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Traceback (most recent call last):
                            Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                            Jan 31 22:32:00 document_consumption_finished.send(
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                            Jan 31 22:32:00 return [
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                            Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                            Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                            Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                            Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                            Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                            Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                            Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                            Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                            Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                            Jan 31 22:32:00 self.__load()
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                            Jan 31 22:32:00 raise e
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                            Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                            Jan 31 22:32:00 raise LookupError(resource_not_found)
                            Jan 31 22:32:00 LookupError:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00
                            Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
                            Jan 31 22:32:00 Traceback (most recent call last):
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
                            Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                            Jan 31 22:32:00 raise LookupError(resource_not_found)
                            Jan 31 22:32:00 LookupError:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00
                            Jan 31 22:32:00
                            Jan 31 22:32:00 During handling of the above exception, another exception occurred:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Traceback (most recent call last):
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
                            Jan 31 22:32:00 raise exc_info[1]
                            Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
                            Jan 31 22:32:00 document_consumption_finished.send(
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
                            Jan 31 22:32:00 return [
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
                            Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
                            Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
                            Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
                            Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
                            Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
                            Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
                            Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
                            Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
                            Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
                            Jan 31 22:32:00 self.__load()
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
                            Jan 31 22:32:00 raise e
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
                            Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
                            Jan 31 22:32:00 raise LookupError(resource_not_found)
                            Jan 31 22:32:00 LookupError:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00
                            Jan 31 22:32:00
                            Jan 31 22:32:00 The above exception was the direct cause of the following exception:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Traceback (most recent call last):
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
                            Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
                            Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
                            Jan 31 22:32:00 return self.run(*args, **kwargs)
                            Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
                            Jan 31 22:32:00 document = Consumer().try_consume_file(
                            Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
                            Jan 31 22:32:00 self._fail(
                            Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
                            Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
                            Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00 Resource stopwords not found.
                            Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
                            Jan 31 22:32:00
                            Jan 31 22:32:00 >>> import nltk
                            Jan 31 22:32:00 >>> nltk.download('stopwords')
                            Jan 31 22:32:00
                            Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Attempted to load corpora/stopwords
                            Jan 31 22:32:00
                            Jan 31 22:32:00 Searched in:
                            Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
                            Jan 31 22:32:00 **********************************************************************
                            Jan 31 22:32:00
                            
                            girishG Offline
                            girishG Offline
                            girish
                            Staff
                            wrote on last edited by
                            #12

                            @scooke do you even have /usr/local/share/nltk_data/corpora/stopwords ?

                            scookeS 1 Reply Last reply
                            0
                            • girishG girish

                              @scooke said in Latest update seems to have similar issue as before, resources not found:

                              Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/

                              So, this is wrong. It should be corpora/stopwords/ , I think.

                              root@07a7fcf3-f9ff-4051-9d89-dec6ed4a4777:/usr/local/share/nltk_data/corpora/stopwords# ls
                              README       basque   chinese  english  german  hinglish    italian  norwegian   russian  swedish
                              arabic       bengali  danish   finnish  greek   hungarian   kazakh   portuguese  slovene  tajik
                              azerbaijani  catalan  dutch    french   hebrew  indonesian  nepali   romanian    spanish  turkish
                              
                              scookeS Offline
                              scookeS Offline
                              scooke
                              wrote on last edited by
                              #13

                              @girish Although there is definitely a stopwords.zip in the corpora/stopwords directory, there is also a stopwords folder (unzipped, I suppose). That is one thing to correct then.

                              I found this link: https://aur.archlinux.org/packages/paperless-ngx. Six comments from the bottom ammo wrote that executing ln -s /usr/share/nltk_data /usr/local/share/nltk_data fixed it for him. I tried it but, of course, these are read-only directories. So maybe this is a second thing to fix?

                              A life lived in fear is a life half-lived

                              1 Reply Last reply
                              0
                              • girishG girish

                                @scooke do you even have /usr/local/share/nltk_data/corpora/stopwords ?

                                scookeS Offline
                                scookeS Offline
                                scooke
                                wrote on last edited by
                                #14

                                @girish said in Latest update seems to have similar issue as before, resources not found:

                                /usr/local/share/nltk_data/corpora/stopwords

                                Yep

                                root@mypaperlessimagelongnumbernamethingy:/usr/local/share/nltk_data/corpora/stopwords# ls -al
                                total 160
                                drwxr-xr-x 2 root root  4096 Jan 26 09:05 .
                                drwxr-xr-x 3 root root  4096 Jan 26 09:05 ..
                                -rw-r--r-- 1 root root   909 Jan 26 09:05 README
                                -rw-r--r-- 1 root root  6348 Jan 26 09:05 arabic
                                -rw-r--r-- 1 root root   967 Jan 26 09:05 azerbaijani
                                -rw-r--r-- 1 root root  2202 Jan 26 09:05 basque
                                -rw-r--r-- 1 root root  5443 Jan 26 09:05 bengali
                                -rw-r--r-- 1 root root  1558 Jan 26 09:05 catalan
                                -rw-r--r-- 1 root root  5560 Jan 26 09:05 chinese
                                -rw-r--r-- 1 root root   424 Jan 26 09:05 danish
                                -rw-r--r-- 1 root root   453 Jan 26 09:05 dutch
                                -rw-r--r-- 1 root root   936 Jan 26 09:05 english
                                -rw-r--r-- 1 root root  1579 Jan 26 09:05 finnish
                                -rw-r--r-- 1 root root   813 Jan 26 09:05 french
                                -rw-r--r-- 1 root root  1362 Jan 26 09:05 german
                                -rw-r--r-- 1 root root  2167 Jan 26 09:05 greek
                                -rw-r--r-- 1 root root  1836 Jan 26 09:05 hebrew
                                -rw-r--r-- 1 root root  5958 Jan 26 09:05 hinglish
                                -rw-r--r-- 1 root root  1227 Jan 26 09:05 hungarian
                                -rw-r--r-- 1 root root  6446 Jan 26 09:05 indonesian
                                -rw-r--r-- 1 root root  1654 Jan 26 09:05 italian
                                -rw-r--r-- 1 root root  3880 Jan 26 09:05 kazakh
                                -rw-r--r-- 1 root root  3610 Jan 26 09:05 nepali
                                -rw-r--r-- 1 root root   851 Jan 26 09:05 norwegian
                                -rw-r--r-- 1 root root  1286 Jan 26 09:05 portuguese
                                -rw-r--r-- 1 root root  1910 Jan 26 09:05 romanian
                                -rw-r--r-- 1 root root  1235 Jan 26 09:05 russian
                                -rw-r--r-- 1 root root 15980 Jan 26 09:05 slovene
                                -rw-r--r-- 1 root root  2176 Jan 26 09:05 spanish
                                -rw-r--r-- 1 root root   559 Jan 26 09:05 swedish
                                -rw-r--r-- 1 root root  1818 Jan 26 09:05 tajik
                                -rw-r--r-- 1 root root   260 Jan 26 09:05 turkish
                                

                                A life lived in fear is a life half-lived

                                1 Reply Last reply
                                0
                                • scookeS Offline
                                  scookeS Offline
                                  scooke
                                  wrote on last edited by
                                  #15

                                  Thank you for troubleshooting this with me. Weird that I'm the only one!

                                  A life lived in fear is a life half-lived

                                  girishG 2 Replies Last reply
                                  0
                                  • scookeS scooke

                                    Thank you for troubleshooting this with me. Weird that I'm the only one!

                                    girishG Offline
                                    girishG Offline
                                    girish
                                    Staff
                                    wrote on last edited by
                                    #16

                                    @scooke I think I am not testing this correctly because as per the code atleast it should fail for me but it doesn't.

                                    Should I enable some flag inside paperless for nltk handling ? There is a complete absence of nltk in my logs.

                                    1 Reply Last reply
                                    0
                                    • scookeS scooke

                                      Thank you for troubleshooting this with me. Weird that I'm the only one!

                                      girishG Offline
                                      girishG Offline
                                      girish
                                      Staff
                                      wrote on last edited by
                                      #17

                                      @scooke Can you try with 1.5.4 please?

                                      I think I found the issue. First, the classifier needs to be created with document_create_classifier . One has to also put some tags, categories etc for documents. Once all that is setup, the nltk stuff kicks in. Without the classifier, nltk is just skipped.

                                      scookeS 1 Reply Last reply
                                      0
                                      • girishG girish

                                        @scooke Can you try with 1.5.4 please?

                                        I think I found the issue. First, the classifier needs to be created with document_create_classifier . One has to also put some tags, categories etc for documents. Once all that is setup, the nltk stuff kicks in. Without the classifier, nltk is just skipped.

                                        scookeS Offline
                                        scookeS Offline
                                        scooke
                                        wrote on last edited by
                                        #18

                                        @girish I"m not seeing v.1.5.4... is there a way to force it?

                                        A life lived in fear is a life half-lived

                                        girishG 1 Reply Last reply
                                        0
                                        • A Offline
                                          A Offline
                                          axel0681
                                          wrote on last edited by
                                          #19

                                          Latest Update 1.5.4 is working for me .

                                          Thanks

                                          1 Reply Last reply
                                          1
                                          • scookeS scooke

                                            @girish I"m not seeing v.1.5.4... is there a way to force it?

                                            girishG Offline
                                            girishG Offline
                                            girish
                                            Staff
                                            wrote on last edited by
                                            #20

                                            @scooke said in Latest update seems to have similar issue as before, resources not found:

                                            @girish I"m not seeing v.1.5.4... is there a way to force it?

                                            Strange, it's published normally. Did you check for updates manually?

                                            scookeS 1 Reply Last reply
                                            0
                                            Reply
                                            • Reply as topic
                                            Log in to reply
                                            • Oldest to Newest
                                            • Newest to Oldest
                                            • Most Votes


                                              • Login

                                              • Don't have an account? Register

                                              • Login or register to search.
                                              • First post
                                                Last post
                                              0
                                              • Categories
                                              • Recent
                                              • Tags
                                              • Popular
                                              • Bookmarks
                                              • Search