<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Latest update seems to have similar issue as before, resources not found]]></title><description><![CDATA[<p dir="auto">Package v.1.5.2 has a problem. Fortunately I had a working backup for v.1.5.1.</p>
<pre><code>Please use the NLTK Downloader to obtain the resource:

  &gt;&gt;&gt; import nltk
  &gt;&gt;&gt; nltk.download('stopwords')
</code></pre>
<p dir="auto">PDFs would upload but wouldn't get processed.</p>
]]></description><link>https://forum.cloudron.io/topic/8556/latest-update-seems-to-have-similar-issue-as-before-resources-not-found</link><generator>RSS for Node</generator><lastBuildDate>Mon, 08 Jun 2026 04:51:55 GMT</lastBuildDate><atom:link href="https://forum.cloudron.io/topic/8556.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 28 Jan 2023 16:37:43 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Wed, 01 Feb 2023 20:20:19 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> I'm happy to report that once upgraded to v1.5.4 that the PDF upload and processing completes without any problems. Whatever you did worked! Thank you!</p>
]]></description><link>https://forum.cloudron.io/post/61277</link><guid isPermaLink="true">https://forum.cloudron.io/post/61277</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Wed, 01 Feb 2023 20:20:19 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Wed, 01 Feb 2023 12:13:11 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> ah, that's correct. you have to update from 1.5.1 to 1.5.2 to 1.5.3 .</p>
]]></description><link>https://forum.cloudron.io/post/61238</link><guid isPermaLink="true">https://forum.cloudron.io/post/61238</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Wed, 01 Feb 2023 12:13:11 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Wed, 01 Feb 2023 12:09:00 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> Do I need to be upgraded to v1.5.2 to see it? Or do I need to have auto-updates turned on. I've turned them off and am checking manually as I'd rather have an older version that works than a newer one that doesn't.</p>
]]></description><link>https://forum.cloudron.io/post/61237</link><guid isPermaLink="true">https://forum.cloudron.io/post/61237</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Wed, 01 Feb 2023 12:09:00 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Wed, 01 Feb 2023 06:27:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> said in <a href="/post/61203">Latest update seems to have similar issue as before, resources not found</a>:</p>
<blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> I"m not seeing v.1.5.4... is there a way to force it?</p>
</blockquote>
<p dir="auto">Strange, it's published normally. Did you check for updates manually?</p>
]]></description><link>https://forum.cloudron.io/post/61213</link><guid isPermaLink="true">https://forum.cloudron.io/post/61213</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Wed, 01 Feb 2023 06:27:43 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 23:47:31 GMT]]></title><description><![CDATA[<p dir="auto">Latest Update 1.5.4 is working for me .</p>
<p dir="auto">Thanks</p>
]]></description><link>https://forum.cloudron.io/post/61207</link><guid isPermaLink="true">https://forum.cloudron.io/post/61207</guid><dc:creator><![CDATA[axel0681]]></dc:creator><pubDate>Tue, 31 Jan 2023 23:47:31 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 22:51:05 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> I"m not seeing v.1.5.4... is there a way to force it?</p>
]]></description><link>https://forum.cloudron.io/post/61203</link><guid isPermaLink="true">https://forum.cloudron.io/post/61203</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 22:51:05 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 22:38:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> Can you try with 1.5.4 please?</p>
<p dir="auto">I think I found the issue. First, the classifier needs to be created with <code>document_create_classifier</code> . One has to also put some tags, categories etc for documents. Once all that is setup, the nltk stuff kicks in. Without the classifier, nltk is just skipped.</p>
]]></description><link>https://forum.cloudron.io/post/61202</link><guid isPermaLink="true">https://forum.cloudron.io/post/61202</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 22:38:40 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:59:26 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> I think I am not testing this correctly because as per the code atleast it should fail for me but it doesn't.</p>
<p dir="auto">Should I enable some flag inside paperless for nltk handling ? There is a complete absence of nltk in my logs.</p>
]]></description><link>https://forum.cloudron.io/post/61199</link><guid isPermaLink="true">https://forum.cloudron.io/post/61199</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:59:26 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:43:23 GMT]]></title><description><![CDATA[<p dir="auto">Thank you for troubleshooting this with me. Weird that I'm the only one!</p>
]]></description><link>https://forum.cloudron.io/post/61198</link><guid isPermaLink="true">https://forum.cloudron.io/post/61198</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:43:23 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:42:42 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> said in <a href="/post/61195">Latest update seems to have similar issue as before, resources not found</a>:</p>
<blockquote>
<p dir="auto">/usr/local/share/nltk_data/corpora/stopwords</p>
</blockquote>
<p dir="auto">Yep</p>
<pre><code>root@mypaperlessimagelongnumbernamethingy:/usr/local/share/nltk_data/corpora/stopwords# ls -al
total 160
drwxr-xr-x 2 root root  4096 Jan 26 09:05 .
drwxr-xr-x 3 root root  4096 Jan 26 09:05 ..
-rw-r--r-- 1 root root   909 Jan 26 09:05 README
-rw-r--r-- 1 root root  6348 Jan 26 09:05 arabic
-rw-r--r-- 1 root root   967 Jan 26 09:05 azerbaijani
-rw-r--r-- 1 root root  2202 Jan 26 09:05 basque
-rw-r--r-- 1 root root  5443 Jan 26 09:05 bengali
-rw-r--r-- 1 root root  1558 Jan 26 09:05 catalan
-rw-r--r-- 1 root root  5560 Jan 26 09:05 chinese
-rw-r--r-- 1 root root   424 Jan 26 09:05 danish
-rw-r--r-- 1 root root   453 Jan 26 09:05 dutch
-rw-r--r-- 1 root root   936 Jan 26 09:05 english
-rw-r--r-- 1 root root  1579 Jan 26 09:05 finnish
-rw-r--r-- 1 root root   813 Jan 26 09:05 french
-rw-r--r-- 1 root root  1362 Jan 26 09:05 german
-rw-r--r-- 1 root root  2167 Jan 26 09:05 greek
-rw-r--r-- 1 root root  1836 Jan 26 09:05 hebrew
-rw-r--r-- 1 root root  5958 Jan 26 09:05 hinglish
-rw-r--r-- 1 root root  1227 Jan 26 09:05 hungarian
-rw-r--r-- 1 root root  6446 Jan 26 09:05 indonesian
-rw-r--r-- 1 root root  1654 Jan 26 09:05 italian
-rw-r--r-- 1 root root  3880 Jan 26 09:05 kazakh
-rw-r--r-- 1 root root  3610 Jan 26 09:05 nepali
-rw-r--r-- 1 root root   851 Jan 26 09:05 norwegian
-rw-r--r-- 1 root root  1286 Jan 26 09:05 portuguese
-rw-r--r-- 1 root root  1910 Jan 26 09:05 romanian
-rw-r--r-- 1 root root  1235 Jan 26 09:05 russian
-rw-r--r-- 1 root root 15980 Jan 26 09:05 slovene
-rw-r--r-- 1 root root  2176 Jan 26 09:05 spanish
-rw-r--r-- 1 root root   559 Jan 26 09:05 swedish
-rw-r--r-- 1 root root  1818 Jan 26 09:05 tajik
-rw-r--r-- 1 root root   260 Jan 26 09:05 turkish
</code></pre>
]]></description><link>https://forum.cloudron.io/post/61197</link><guid isPermaLink="true">https://forum.cloudron.io/post/61197</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:42:42 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:41:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> Although there is definitely a stopwords.zip in the corpora/stopwords directory, there is also a stopwords folder (unzipped, I suppose). That is one thing to correct then.</p>
<p dir="auto">I found this link: <a href="https://aur.archlinux.org/packages/paperless-ngx" target="_blank" rel="noopener noreferrer nofollow ugc">https://aur.archlinux.org/packages/paperless-ngx</a>. Six comments from the bottom ammo wrote that executing <code>ln -s /usr/share/nltk_data /usr/local/share/nltk_data</code> fixed it for him. I tried it but, of course, these are read-only directories. So maybe this is a second thing to fix?</p>
]]></description><link>https://forum.cloudron.io/post/61196</link><guid isPermaLink="true">https://forum.cloudron.io/post/61196</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:41:08 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:40:14 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> do you even have <code>/usr/local/share/nltk_data/corpora/stopwords</code> ?</p>
]]></description><link>https://forum.cloudron.io/post/61195</link><guid isPermaLink="true">https://forum.cloudron.io/post/61195</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:40:14 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:37:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> said in <a href="/post/61193">Latest update seems to have similar issue as before, resources not found</a>:</p>
<blockquote>
<p dir="auto">Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/</p>
</blockquote>
<p dir="auto">So, this is wrong. It should be corpora/stopwords/ , I think.</p>
<pre><code>root@07a7fcf3-f9ff-4051-9d89-dec6ed4a4777:/usr/local/share/nltk_data/corpora/stopwords# ls
README       basque   chinese  english  german  hinglish    italian  norwegian   russian  swedish
arabic       bengali  danish   finnish  greek   hungarian   kazakh   portuguese  slovene  tajik
azerbaijani  catalan  dutch    french   hebrew  indonesian  nepali   romanian    spanish  turkish
</code></pre>
]]></description><link>https://forum.cloudron.io/post/61194</link><guid isPermaLink="true">https://forum.cloudron.io/post/61194</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:37:17 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:34:12 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> I've updated to v1.5.2, and neither with or without that conf setting does it work. Still saying it can't find corpora/stopwords, even though it's there.</p>
<p dir="auto">Logs:</p>
<pre><code>Jan 31 22:31:41 [2023-01-31 21:31:41,532] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] received
Jan 31 22:31:41 [2023-01-31 21:31:41,652] [INFO] [paperless.consumer] Consuming Doc - May 25, 2014, 11-08 AM.pdf
Jan 31 22:31:44 [2023-01-31 21:31:44,150] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 4.01 - no change
Jan 31 22:31:53 [2023-01-31 21:31:53,935] [INFO] [ocrmypdf._sync] Postprocessing...
Jan 31 22:31:54 [2023-01-31 21:31:54,867] [INFO] [ocrmypdf._pipeline] Optimize ratio: 1.40 savings: 28.3%
Jan 31 22:31:54 [2023-01-31 21:31:54,872] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.078 * 10 changes in 300 seconds. Saving...
Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.079 * Background saving started by pid 1303
Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.083 * DB saved on disk
Jan 31 22:31:57 1303:C 31 Jan 2023 21:31:57.084 * RDB: 0 MB of memory used by copy-on-write
Jan 31 22:31:57 1263:M 31 Jan 2023 21:31:57.180 * Background saving terminated with success
Jan 31 22:32:00 [2023-01-31 21:32:00,895] [ERROR] [paperless.consumer] The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Traceback (most recent call last):
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 22:32:00 raise LookupError(resource_not_found)
Jan 31 22:32:00 LookupError:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00
Jan 31 22:32:00
Jan 31 22:32:00 During handling of the above exception, another exception occurred:
Jan 31 22:32:00
Jan 31 22:32:00 Traceback (most recent call last):
Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
Jan 31 22:32:00 document_consumption_finished.send(
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
Jan 31 22:32:00 return [
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in &lt;listcomp&gt;
Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
Jan 31 22:32:00 self.__load()
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
Jan 31 22:32:00 raise e
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 22:32:00 raise LookupError(resource_not_found)
Jan 31 22:32:00 LookupError:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00
Jan 31 22:32:00 [2023-01-31 21:32:00,915] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[3e572bd2-9ccc-4ee4-9201-fb350b47cfd9] raised unexpected: ConsumerError("Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m&gt;&gt;&gt; import nltk\n &gt;&gt;&gt; nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/local/share/ntlk_data'\n**********************************************************************\n")
Jan 31 22:32:00 Traceback (most recent call last):
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{zip_name}")
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 22:32:00 raise LookupError(resource_not_found)
Jan 31 22:32:00 LookupError:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords.zip/stopwords/
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00
Jan 31 22:32:00
Jan 31 22:32:00 During handling of the above exception, another exception occurred:
Jan 31 22:32:00
Jan 31 22:32:00 Traceback (most recent call last):
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
Jan 31 22:32:00 raise exc_info[1]
Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
Jan 31 22:32:00 document_consumption_finished.send(
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
Jan 31 22:32:00 return [
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in &lt;listcomp&gt;
Jan 31 22:32:00 (receiver, receiver(signal=self, sender=sender, **named))
Jan 31 22:32:00 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
Jan 31 22:32:00 matched_tags = matching.match_tags(document, classifier)
Jan 31 22:32:00 File "/app/code/src/documents/matching.py", line 50, in match_tags
Jan 31 22:32:00 predicted_tag_ids = classifier.predict_tags(document.content)
Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
Jan 31 22:32:00 X = self.data_vectorizer.transform([self.preprocess_content(content)])
Jan 31 22:32:00 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
Jan 31 22:32:00 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
Jan 31 22:32:00 self.__load()
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
Jan 31 22:32:00 raise e
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
Jan 31 22:32:00 root = nltk.data.find(f"{self.subdir}/{self.__name}")
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 22:32:00 raise LookupError(resource_not_found)
Jan 31 22:32:00 LookupError:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00
Jan 31 22:32:00
Jan 31 22:32:00 The above exception was the direct cause of the following exception:
Jan 31 22:32:00
Jan 31 22:32:00 Traceback (most recent call last):
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
Jan 31 22:32:00 R = retval = fun(*args, **kwargs)
Jan 31 22:32:00 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
Jan 31 22:32:00 return self.run(*args, **kwargs)
Jan 31 22:32:00 File "/app/code/src/documents/tasks.py", line 192, in consume_file
Jan 31 22:32:00 document = Consumer().try_consume_file(
Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
Jan 31 22:32:00 self._fail(
Jan 31 22:32:00 File "/app/code/src/documents/consumer.py", line 93, in _fail
Jan 31 22:32:00 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
Jan 31 22:32:00 documents.consumer.ConsumerError: Doc - May 25, 2014, 11-08 AM.pdf: The following error occurred while consuming Doc - May 25, 2014, 11-08 AM.pdf:
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00 Resource stopwords not found.
Jan 31 22:32:00 Please use the NLTK Downloader to obtain the resource:
Jan 31 22:32:00
Jan 31 22:32:00 &gt;&gt;&gt; import nltk
Jan 31 22:32:00 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 22:32:00
Jan 31 22:32:00 For more information see: https://www.nltk.org/data.html
Jan 31 22:32:00
Jan 31 22:32:00 Attempted to load corpora/stopwords
Jan 31 22:32:00
Jan 31 22:32:00 Searched in:
Jan 31 22:32:00 - '/usr/local/share/ntlk_data'
Jan 31 22:32:00 **********************************************************************
Jan 31 22:32:00
</code></pre>
]]></description><link>https://forum.cloudron.io/post/61193</link><guid isPermaLink="true">https://forum.cloudron.io/post/61193</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:34:12 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 21:21:30 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> Weird. Adding that to the conf, while still on v1.5.1, causes it to fail again.</p>
<p dir="auto">The contents of that directory are: corpora, stemmers, tokenizers</p>
<p dir="auto">Despite the error, which reads thusly:</p>
<pre><code>********************************************************************** Resource [93mstopwords[0m not found. Please use the NLTK Downloader to obtain the resource: [31m&gt;&gt;&gt; import nltk &gt;&gt;&gt; nltk.download('stopwords') [0m For more information see: https://www.nltk.org/data.html Attempted to load [93mcorpora/stopwords[0m Searched in: - '/usr/local/share/ntlk_data' **********************************************************************
</code></pre>
<p dir="auto">there <em>is</em> /usr/local/share/ntlk_data/corpora/stopwords, which includes a stopwords.zip file.</p>
<p dir="auto">I noticed that in the conf file, the env <code>PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data</code> has the NLTK highlighted in brown, which must mean that the syntax is off, right?</p>
<p dir="auto">Finally, I also noticed at the bottom of the env file that the following are all commented out:</p>
<pre><code># Binaries

#PAPERLESS_CONVERT_BINARY=/usr/bin/convert
#PAPERLESS_GS_BINARY=/usr/bin/gs
#PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
</code></pre>
<p dir="auto">Commenting out the line you suggested I add (running v1.5.1) sees the app functioning properly.</p>
<p dir="auto">I guess I will now try to update again, then add that line to the conf file and see what happens.</p>
]]></description><link>https://forum.cloudron.io/post/61192</link><guid isPermaLink="true">https://forum.cloudron.io/post/61192</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 21:21:30 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 20:44:58 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/axel0681" aria-label="Profile: axel0681">@<bdi>axel0681</bdi></a> Actually, can you set <code>PAPERLESS_NLTK_DIR=/usr/local/share/ntlk_data</code> in paperless.conf and restart the app ?</p>
]]></description><link>https://forum.cloudron.io/post/61188</link><guid isPermaLink="true">https://forum.cloudron.io/post/61188</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 20:44:58 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 20:42:32 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/axel0681" aria-label="Profile: axel0681">@<bdi>axel0681</bdi></a> can you check the contents of <code>/usr/local/share/nltk_data</code> ?</p>
]]></description><link>https://forum.cloudron.io/post/61187</link><guid isPermaLink="true">https://forum.cloudron.io/post/61187</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 20:42:32 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 18:55:46 GMT]]></title><description><![CDATA[<p dir="auto">Hello,</p>
<p dir="auto">Same Problem on our sites.<br />
Package Versioncom.paperlessng.cloudronapp@1.5.2</p>
<p dir="auto">The last step "saving the Dokument failed"</p>
<p dir="auto">Thx<br />
Axel</p>
<pre><code>abc.pdf: The following error occurred while consuming 
abc.pdf: 
**********************************************************************
  Resource [93mstopwords[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m&gt;&gt;&gt; import nltk
  &gt;&gt;&gt; nltk.download('stopwords')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/stopwords[0m

  Searched in:
    - '/usr/share/nltk_data'
**********************************************************************
</code></pre>
]]></description><link>https://forum.cloudron.io/post/61186</link><guid isPermaLink="true">https://forum.cloudron.io/post/61186</guid><dc:creator><![CDATA[axel0681]]></dc:creator><pubDate>Tue, 31 Jan 2023 18:55:46 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 18:40:31 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> Ah, the app updated to v1.5.2 overnight. I have restored once again to v1.5.1 and it works fine now. I've turned off Auto-update for now.</p>
]]></description><link>https://forum.cloudron.io/post/61179</link><guid isPermaLink="true">https://forum.cloudron.io/post/61179</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 18:40:31 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 18:30:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/girish" aria-label="Profile: girish">@<bdi>girish</bdi></a> It had worked when I restored to v1.5.1, but just today the problem returned.</p>
<p dir="auto">In the actual web apps Files dashboard, under the "Failed" tab, is this error:</p>
<p dir="auto">Drop Everything and Read—but How_.pdf: The following error occurred while consuming Drop Everything and Read—but How_.pdf:</p>
<pre><code>**********************************************************************
  Resource [93mstopwords[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m&gt;&gt;&gt; import nltk
  &gt;&gt;&gt; nltk.download('stopwords')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/stopwords[0m

  Searched in:
    - '/usr/share/nltk_data'
**********************************************************************
</code></pre>
<p dir="auto">Here are logs from the Cloudron app dashboard:</p>
<pre><code>Jan 31 19:27:48 Searched in:
Jan 31 19:27:48 - '/usr/share/nltk_data'
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48
Jan 31 19:27:48 [2023-01-31 18:27:48,579] [ERROR] [celery.app.trace] Task documents.tasks.consume_file[fa64d1a1-7726-4390-ad33-1094d0600517] raised unexpected: ConsumerError("Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf: \n**********************************************************************\n Resource \x1b[93mstopwords\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m&gt;&gt;&gt; import nltk\n &gt;&gt;&gt; nltk.download('stopwords')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mcorpora/stopwords\x1b[0m\n\n Searched in:\n - '/usr/share/nltk_data'\n**********************************************************************\n")
Jan 31 19:27:48 Traceback (most recent call last):
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 84, in __load
Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{zip_name}")
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 19:27:48 raise LookupError(resource_not_found)
Jan 31 19:27:48 LookupError:
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48 Resource stopwords not found.
Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
Jan 31 19:27:48
Jan 31 19:27:48 &gt;&gt;&gt; import nltk
Jan 31 19:27:48 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 19:27:48
Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
Jan 31 19:27:48
Jan 31 19:27:48 Attempted to load corpora/stopwords.zip/stopwords/
Jan 31 19:27:48
Jan 31 19:27:48 Searched in:
Jan 31 19:27:48 - '/usr/share/nltk_data'
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48
Jan 31 19:27:48
Jan 31 19:27:48 During handling of the above exception, another exception occurred:
Jan 31 19:27:48
Jan 31 19:27:48 Traceback (most recent call last):
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 302, in main_wrap
Jan 31 19:27:48 raise exc_info[1]
Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 410, in try_consume_file
Jan 31 19:27:48 document_consumption_finished.send(
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 176, in send
Jan 31 19:27:48 return [
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/django/dispatch/dispatcher.py", line 177, in &lt;listcomp&gt;
Jan 31 19:27:48 (receiver, receiver(signal=self, sender=sender, **named))
Jan 31 19:27:48 File "/app/code/src/documents/signals/handlers.py", line 194, in set_tags
Jan 31 19:27:48 matched_tags = matching.match_tags(document, classifier)
Jan 31 19:27:48 File "/app/code/src/documents/matching.py", line 50, in match_tags
Jan 31 19:27:48 predicted_tag_ids = classifier.predict_tags(document.content)
Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 370, in predict_tags
Jan 31 19:27:48 X = self.data_vectorizer.transform([self.preprocess_content(content)])
Jan 31 19:27:48 File "/app/code/src/documents/classifier.py", line 331, in preprocess_content
Jan 31 19:27:48 self._stop_words = set(stopwords.words(settings.NLTK_LANGUAGE))
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 121, in __getattr__
Jan 31 19:27:48 self.__load()
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 86, in __load
Jan 31 19:27:48 raise e
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/corpus/util.py", line 81, in __load
Jan 31 19:27:48 root = nltk.data.find(f"{self.subdir}/{self.__name}")
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/nltk/data.py", line 583, in find
Jan 31 19:27:48 raise LookupError(resource_not_found)
Jan 31 19:27:48 LookupError:
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48 Resource stopwords not found.
Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
Jan 31 19:27:48
Jan 31 19:27:48 &gt;&gt;&gt; import nltk
Jan 31 19:27:48 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 19:27:48
Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
Jan 31 19:27:48
Jan 31 19:27:48 Attempted to load corpora/stopwords
Jan 31 19:27:48
Jan 31 19:27:48 Searched in:
Jan 31 19:27:48 - '/usr/share/nltk_data'
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48
Jan 31 19:27:48
Jan 31 19:27:48 The above exception was the direct cause of the following exception:
Jan 31 19:27:48
Jan 31 19:27:48 Traceback (most recent call last):
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 451, in trace_task
Jan 31 19:27:48 R = retval = fun(*args, **kwargs)
Jan 31 19:27:48 File "/usr/local/lib/python3.10/dist-packages/celery/app/trace.py", line 734, in __protected_call__
Jan 31 19:27:48 return self.run(*args, **kwargs)
Jan 31 19:27:48 File "/app/code/src/documents/tasks.py", line 192, in consume_file
Jan 31 19:27:48 document = Consumer().try_consume_file(
Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 468, in try_consume_file
Jan 31 19:27:48 self._fail(
Jan 31 19:27:48 File "/app/code/src/documents/consumer.py", line 93, in _fail
Jan 31 19:27:48 raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
Jan 31 19:27:48 documents.consumer.ConsumerError: Englishlanguagelearners.pdf: The following error occurred while consuming Englishlanguagelearners.pdf:
Jan 31 19:27:48 **********************************************************************
Jan 31 19:27:48 Resource stopwords not found.
Jan 31 19:27:48 Please use the NLTK Downloader to obtain the resource:
Jan 31 19:27:48
Jan 31 19:27:48 &gt;&gt;&gt; import nltk
Jan 31 19:27:48 &gt;&gt;&gt; nltk.download('stopwords')
Jan 31 19:27:48
Jan 31 19:27:48 For more information see: https://www.nltk.org/data.html
Jan 31 19:27:48
Jan 31 19:27:48 Attempted to load corpora/stopwords
Jan 31 19:27:48
Jan 31 19:27:48 Searched in:
Jan 31 19:27:48 - '/usr/share/nltk_data'
Jan 31 19:27:48 **********************************************************************
</code></pre>
]]></description><link>https://forum.cloudron.io/post/61176</link><guid isPermaLink="true">https://forum.cloudron.io/post/61176</guid><dc:creator><![CDATA[scooke]]></dc:creator><pubDate>Tue, 31 Jan 2023 18:30:43 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Tue, 31 Jan 2023 17:50:29 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> actually, I can't reproduce this. Where do you see that message "Please use the NLTK Downloader to obtain the resource:" ?</p>
<p dir="auto">NTLK data is already included in the image now since package 1.5.0.</p>
]]></description><link>https://forum.cloudron.io/post/61167</link><guid isPermaLink="true">https://forum.cloudron.io/post/61167</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Tue, 31 Jan 2023 17:50:29 GMT</pubDate></item><item><title><![CDATA[Reply to Latest update seems to have similar issue as before, resources not found on Mon, 30 Jan 2023 12:27:13 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/scooke" aria-label="Profile: scooke">@<bdi>scooke</bdi></a> thanks for reporting, think I can reproduce this.</p>
]]></description><link>https://forum.cloudron.io/post/61065</link><guid isPermaLink="true">https://forum.cloudron.io/post/61065</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Mon, 30 Jan 2023 12:27:13 GMT</pubDate></item></channel></rss>