Tesseract-OCR (Optical Character Recognition) on Cloudron
-
Tesseract helps your computer recognize text embedded in images and extract it as text. It is a text recognition engine.
OCR can be useful for example in the editing of memes or in computer gaming, where you wish to take data from the game and process it outside of the game in another application.
There is a Docker image.
https://github.com/tesseract-ocr/tesseract
Tesseract might be of use with paperless-ng, which Cloudron already supports. There is a thread mentioning this here:
https://forum.cloudron.io/topic/6346/multi-language-ocr-support/12?_=1655907503717
Ubuntu PPA:
https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-develDocker (Tesseract 5.0 is out now, I think these are only 4.0)
https://tesseract-ocr.github.io/tessdoc/Docker-Containers.html
Documentation:
https://tesseract-ocr.github.io/tessdoc/Home.html -
-
A quick reading suggests that this is a CLI tool (and not an app). This is also installed in paperless already btw. @LoudLemur Are you having trouble with tesseract and paperless?