"run rules" on already imported docs
-
This may be staring me in the face but can't see it.
Paperless-NG does an excellent job of indexing a document on import, eg correspondent, tag.
But that depends on setting those up in advance.I have large archive docs I want to bring in to the app.
I can setup the "obvious" ones, but will probably miss a number (eg suppliers I have forgotten I dealt with but have docs from them.Is it possible to set up eg correspondent after importing and then "run rules" on already imported docs which were not categorised on initial import ?
-
@timconsidine I guess you have looked into https://paperless-ng.readthedocs.io/en/latest/advanced_usage.html# already. I also couldn't quickly find some obvious command to run.
But generally, so you put the documents into the storage folder and then just want it to put them to the index? Or what did you mean with:
@timconsidine said in "run rules" on already imported docs:
But that depends on setting those up in advance.
-
@nebulon thank you
I am thinking to :- upload a folder of several docs, too many to look at manually.
- allow Paperless to do its excellent matching and tagging
- probably then find an unforeseeable number of docs which should be allocated to correspondents or tags but weren't because I didn';t set this up in advance
- somehow - this is the question - get Paperless to go through matching process on already imported docs.
Maybe I am over-worrying.
Just conscious it will be a big job importing this archive and trying to reduce my manual work tagging/classifying post-import. -
@timconsidine I think you are looking for document retagger. I used this already and it works great!
-
@neurokrish thank you !
EDIT : jeez, isn't Paperless-NG awesome !
-
PS, anyone needing to do this, open the cloudron terminal for the app and then run the following:
cd src python3 manage.py document_retagger -T
-
@ChristopherMag said in "run rules" on already imported docs:
PS, anyone needing to do this, open the cloudron terminal for the app and then run the following:
python3 src/manage.py document_retagger -T
Sounds like something that'd be useful to add to the docs @Staff
-
@ChristopherMag thanks, documented in https://docs.cloudron.io/apps/paperless-ngx/#document-retagger
-
@girish I thought that you could run the command without first cd'ing into the src directory but when running it again today I found out that was incorrect.
I have edited my post above to first include
cd src
and thenpython3 manage.py document_retagger -T
.Could you please update the documentation as what I originally posted doesn't work for some reason, python throws an exception if you don't cd first?