Tesseract-OCR (Optical Character Recognition) on Cloudron
-
Tesseract helps your computer recognize text embedded in images and extract it as text. It is a text recognition engine.
OCR can be useful for example in the editing of memes or in computer gaming, where you wish to take data from the game and process it outside of the game in another application.
There is a Docker image.
https://github.com/tesseract-ocr/tesseract
Tesseract might be of use with paperless-ng, which Cloudron already supports. There is a thread mentioning this here:
https://forum.cloudron.io/topic/6346/multi-language-ocr-support/12?_=1655907503717
Ubuntu PPA:
https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-develDocker (Tesseract 5.0 is out now, I think these are only 4.0)
https://tesseract-ocr.github.io/tessdoc/Docker-Containers.html
Documentation:
https://tesseract-ocr.github.io/tessdoc/Home.html -
-
A quick reading suggests that this is a CLI tool (and not an app). This is also installed in paperless already btw. @LoudLemur Are you having trouble with tesseract and paperless?
-
@rmdes Maybe the easiest way would be https://forum.cloudron.io/topic/8383/nextcloud-all-in-one-aio/ ?
-
@Dolgoipa said in Tesseract-OCR (Optical Character Recognition) on Cloudron:
I have heard of Tesseract before and have used it in a couple of projects. It's a great open-source OCR engine that is easy to use and can be integrated with other applications.
You said exactly that before.
I'm inclined to think you are only here to post the link you just shared.
-
@jdaviescoates good catch, think it is a bot.