Error importing documents
-
Hi,
I am trying to import ~2.5k documents from my local paperless-ngx. Unfortunately, I am getting an error using the document_importer as described in the docs.
The error is:
CommandError: The manifest file refers to "<Some-Filename>" which does not appear to be in the source directory.
The filename includes german umlauts. Regarding to this issue there may be some connection to the locale.
Has anybody successfully imported a larger number of documents?Thanks in advance.
-
@stantropics Dropping a file named
KostenΓΌbernahme.pdf
in/app/data/consume
works for me. Let me try the CLI now. -
OK, what I did was:
# mkdir -p /app/data/out # python3 manage.py document_exporter /app/data/out
That exported the documents. Then, I deleted everything in paperless. Then, I reimported:
# python3 manage.py document_importer /app/data/out Installed 4 object(s) from 1 fixture(s) Copy files into paperless... 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 49.80it/s] Updating search index... 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 125.67it/s]
So, maybe there is something else going on. Are you using Web Terminal or Cloudron CLI?
-
@girish Hey, thanks for getting back to this.
I found a workaround: Using--use-filename-format
will name the exported documents by their document ID. -
I am giving paperless-ngx on cloudron another try, but I am again facing some problems im porting my documents.
Exporting the documents from another instance works as expected and importing them back into paperless-ngx works as well:root@d9967e75-b4cc-4808-ba75-a5f12498470c:/app/code# python3 src/manage.py document_importer /app/data/import/ Installed 3258 object(s) from 1 fixture(s) Copy files into paperless... 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3097/3097 [00:17<00:00, 177.43it/s] Updating search index... 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3097/3097 [00:13<00:00, 229.10it/s] root@d9967e75-b4cc-4808-ba75-a5f12498470c:/app/code#
However, after performing the import I am not seeing any data in paperless. Any idea what is going on here? Any help is appreciated.
-
@stantropics for a start, does importing a single document work? Maybe something to do with filenames that you reported earlier?
-
@girish Thanks for getting back to this. The last time it failed I was able to import ~2k documents and their data (tags etc.) successfully. However, I was not able to develop a nice workflow to get data into paperless. This time I have a workflow idea but cannot get my documents (~3k) into paperless on cloudron.
First thing I see is I cannot execute any operation using the
python3 src/manage.py
script. It is always mandatory to perform the following first:
python3 src/manage.py migrate
Unfortunately none of my operations threw any error, but didn't work.
I did the following steps to reproduce the problem:- Install new paperless app
- Import one pdf document
- Export the data to /app/data/export (export data was generated but it looks like there are no documents backed up, just unser information (e.g. the pdf file has not been copied to the backup folder))
- delete paperless app and create a new one
- Import backup
Unfortunately (and as expected) I am not seeing the document or data.