Solved Multi Language OCR Support
Is it possible to enable multi-language OCR support for paperless-ng?
As far as I understand, docker has variables for this. And I also found this modification.
@wisemetalhead according to this, we can enable additional languages under
OCR settingsoptions. This can be found in the
paperless.conffile when you click the
file managericon of the paperless-ng app's settings in Cloudron.
Did the provided suggestion from @neurokrish solve the issue?
@nebulon @WiseMetalhead there is some more information here. Looks like there are two options,
PAPERLESS_OCR_LANGUAGE. The first one determines which packages to install and the second one, which to use. The first option is missing in the
paperless.conffile. May be we can add this manually and restart the app?
@neurokrish @nebulon I tried this method earlier, but it didn't work.
@wisemetalhead according to the docs, the
PAPERLESS_OCR_LANGUAGESoption should be configured in
docker-compose.envand not in
paperless.conf. Perhaps @nebulon can help here..
@neurokrish I have to try this here myself, since the Cloudron app package has nothing to do with their upstream docker image, the default self-hosting config docs would apply instead, so https://paperless-ng.readthedocs.io/en/latest/configuration.html?highlight=languages#ocr-settings
neurokrish last edited by neurokrish
@nebulon then may be a standard apt-get install for additional language packs is the way to go?
May be we can install all OCR languages by default using
sudo apt-get install tesseract-ocr-allas mentioned here. This way, the app has all languages installed by default and users can choose a specific language by modifying the
@neurokrish thanks for the suggestion, at least it solved the issue for me with a
deu+engsetting. The just updated package v0.7.0 has those changes.
@nebulon great! @WiseMetalhead, can you confirm that the latest update solves your issue with OCR?
@neurokrish @nebulon Now it works perfectly! Thanks for the help.
@neurokrish tesseract has a Docker file and it would be nice to support Tesseract on Cloudron.