Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Error - maybe with the latest update

Error - maybe with the latest update

Scheduled Pinned Locked Moved Solved Paperless-ngx
7 Posts 2 Posters 1.1k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • luckowL Offline
    luckowL Offline
    luckow
    translator
    wrote on last edited by
    #1

    As of today, paperless no longer eats papers. It cries along:

    Resource [93mpunkt_tab [0m not found. Please use the NLTK Downloader to obtain the resource: [31m>>> import nltk >>> nltk.download('punkt_tab') [0m For more information see: https://www.nltk.org/data.html Attempted to load [93mtokenizers/punkt_tab/german/ [0m Searched in: - PosixPath('/usr/share/nltk_data')
    Maybe a new feature needs some libs?
    (https://github.com/paperless-ngx/paperless-ngx/discussions/7538)

    Package Version
    com.paperlessng.cloudronapp@1.25.4
    Last Updated
    08/23/2024

    Pronouns: he/him | Primary language: German

    1 Reply Last reply
    1
    • luckowL Offline
      luckowL Offline
      luckow
      translator
      wrote on last edited by
      #7

      Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
      "Ticket" closed. Thank you.

      Pronouns: he/him | Primary language: German

      1 Reply Last reply
      2
      • nebulonN Offline
        nebulonN Offline
        nebulon
        Staff
        wrote on last edited by nebulon
        #2

        We have updated the nltk punkt package to use what upstream now uses at https://git.cloudron.io/cloudron/paperless-ngx-app/-/commit/fc677b19cc3f78ff3a7b9b2db13297851a7545ec

        Original issue was https://github.com/paperless-ngx/paperless-ngx/issues/7519

        Maybe it only fails with specific pdfs. Can you share one which fails, or is just all of the pdfs?

        luckowL 1 Reply Last reply
        0
        • luckowL Offline
          luckowL Offline
          luckow
          translator
          wrote on last edited by
          #3

          15 different pdfs. That's all the pdfs for today.

          Pronouns: he/him | Primary language: German

          1 Reply Last reply
          0
          • nebulonN nebulon marked this topic as a question on
          • nebulonN nebulon

            We have updated the nltk punkt package to use what upstream now uses at https://git.cloudron.io/cloudron/paperless-ngx-app/-/commit/fc677b19cc3f78ff3a7b9b2db13297851a7545ec

            Original issue was https://github.com/paperless-ngx/paperless-ngx/issues/7519

            Maybe it only fails with specific pdfs. Can you share one which fails, or is just all of the pdfs?

            luckowL Offline
            luckowL Offline
            luckow
            translator
            wrote on last edited by luckow
            #4

            @nebulon Try this one ->
            Nope. Because of: PDF is not on the whitelist of allowed types.

            [German_Test_PDF.pdf](Invalid file type. Allowed types are: .png, .jpg, .bmp, .gif, .webm, .mp4, .jpeg)

            Pronouns: he/him | Primary language: German

            1 Reply Last reply
            0
            • luckowL Offline
              luckowL Offline
              luckow
              translator
              wrote on last edited by
              #5

              Go to https://www.blindtextgenerator.de/ create some "Webstandards" text (200 characters, default), copy & paste into a libre office document, generate a pdf and upload it.
              Logfile says:

              Aug 26 19:23:47File "/usr/local/lib/python3.10/dist-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
              Aug 26 19:23:47For more information see: https://www.nltk.org/data.html
              Aug 26 19:23:47For more information see: https://www.nltk.org/data.html
              Aug 26 19:23:47For more information see: https://www.nltk.org/data.html
              Aug 26 19:23:47LookupError:
              Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource:
              Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource:
              Aug 26 19:23:47Please use the NLTK Downloader to obtain the resource:
              Aug 26 19:23:47R = retval = fun(*args, **kwargs)
              Aug 26 19:23:47Resource punkt_tab not found.
              Aug 26 19:23:47Resource punkt_tab not found.
              Aug 26 19:23:47Resource punkt_tab not found.
              Aug 26 19:23:47Searched in:
              Aug 26 19:23:47Searched in:
              Aug 26 19:23:47Searched in:
              Aug 26 19:23:47The above exception was the direct cause of the following exception:
              Aug 26 19:23:47Traceback (most recent call last):
              Aug 26 19:23:47Traceback (most recent call last):
              Aug 26 19:23:47Traceback (most recent call last):
              Aug 26 19:23:47X = self.data_vectorizer.transform([self.preprocess_content(content)])
              Aug 26 19:23:47[2024-08-26 17:23:47,524] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[e6d299f6-9391-4aa2-9827-de6100556bcf] raised unexpected: ConsumerError("German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing: \n**********************************************************************\n Resource \x1b[93mpunkt_tab\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt_tab')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt_tab/german/\x1b[0m\n\n Searched in:\n - PosixPath('/usr/share/nltk_data')\n**********************************************************************\n")
              Aug 26 19:23:47document_consumption_finished.send(
              Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing:
              Aug 26 19:23:47documents.consumer.ConsumerError: German_Test_PDF.pdf: The following error occurred while storing document German_Test_PDF.pdf after parsing:
              Aug 26 19:23:47lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
              Aug 26 19:23:47msg = plugin.run()
              Aug 26 19:23:47msg = plugin.run()
              ``

              Pronouns: he/him | Primary language: German

              1 Reply Last reply
              0
              • luckowL Offline
                luckowL Offline
                luckow
                translator
                wrote on last edited by
                #6

                ok. it looks like a diff between a fresh install and an update. Waiting for a new update. 🙂

                Pronouns: he/him | Primary language: German

                1 Reply Last reply
                0
                • luckowL Offline
                  luckowL Offline
                  luckow
                  translator
                  wrote on last edited by
                  #7

                  Long story short: Update to 1.25.5 requires minBox version 8.0.0. My Cloudron instance was stuck on an older version. After updating to 8.x, the update for Paperless worked perfectly.
                  "Ticket" closed. Thank you.

                  Pronouns: he/him | Primary language: German

                  1 Reply Last reply
                  2
                  • luckowL luckow has marked this topic as solved on
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Bookmarks
                  • Search