In the package's start.sh
, there is the variable NLTK_DATA
set to /app/data/nltk
. Though, the following error happening when uploading a markdown document shows that the variable is not assigned/doesn't work.
Sep 06 15:37:14 PermissionError: [Errno 13] Permission denied: '/root/nltk_data'
Sep 06 15:37:14 Traceback (most recent call last):
Sep 06 15:37:14 ^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 06 15:37:14 _download_nltk_packages_if_not_present()
Sep 06 15:37:14 data = loader.load()
Sep 06 15:37:14 download_nltk_packages()
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = func(*args, **kwargs)
Sep 06 15:37:14 elements = list(
Sep 06 15:37:14 elements = list(elements)
Sep 06 15:37:14 elements = self._get_elements()
Sep 06 15:37:14 for e in self._main.iter_elements():
Sep 06 15:37:14 if exceeds_cap_ratio(text, threshold=cap_threshold):
Sep 06 15:37:14 if is_possible_narrative_text(text):
Sep 06 15:37:14 if sentence_count(text, 3) > 1:
Sep 06 15:37:14 os.mkdir(targetpath, 0o700)
Sep 06 15:37:14 return list(self.lazy_load())
Sep 06 15:37:14 return partition_html(
Sep 06 15:37:14 return partition_md(filename=self.file_path, **self.unstructured_kwargs)
Sep 06 15:37:14 self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
Sep 06 15:37:14 self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
Sep 06 15:37:14 self._handle_fatal_error(e)
Sep 06 15:37:14 self.makedir(tarinfo, targetpath)
Sep 06 15:37:14 sentences = sent_tokenize(text)
Sep 06 15:37:14 tar.extractall(path=nltk_data_dir)
Sep 06 15:37:14 yield from block_item.iter_elements()
Sep 06 15:37:14 yield from block_item.iter_elements()
Sep 06 15:37:14 yield from cls(opts)._iter_elements()
Sep 06 15:37:14 yield from element_accum.flush(ElementCls)
Sep 06 15:37:14 yield from self._element_from_text_or_tail(self.text or "", q, self._ElementCls)