Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Multi Language OCR Support

Multi Language OCR Support

Scheduled Pinned Locked Moved Solved Paperless-ngx
15 Posts 6 Posters 4.5k Views 6 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • WiseMetalheadW Offline
      WiseMetalheadW Offline
      WiseMetalhead
      translator
      wrote on last edited by
      #1

      Is it possible to enable multi-language OCR support for paperless-ng?

      As far as I understand, docker has variables for this. And I also found this modification.

      neurokrishN 1 Reply Last reply
      1
      • WiseMetalheadW WiseMetalhead

        Is it possible to enable multi-language OCR support for paperless-ng?

        As far as I understand, docker has variables for this. And I also found this modification.

        neurokrishN Offline
        neurokrishN Offline
        neurokrish
        wrote on last edited by
        #2

        @wisemetalhead according to this, we can enable additional languages under PAPERLESS_OCR_LANGUAGE in OCR settings options. This can be found in the paperless.conf file when you click the file manager icon of the paperless-ng app's settings in Cloudron.

        Capture.PNG

        c258e287-d7cd-4d09-8572-edddc08fc76f-image.png

        1 Reply Last reply
        1
        • nebulonN Offline
          nebulonN Offline
          nebulon
          Staff
          wrote on last edited by
          #3

          Did the provided suggestion from @neurokrish solve the issue?

          neurokrishN 1 Reply Last reply
          2
          • nebulonN nebulon

            Did the provided suggestion from @neurokrish solve the issue?

            neurokrishN Offline
            neurokrishN Offline
            neurokrish
            wrote on last edited by
            #4

            @nebulon @WiseMetalhead there is some more information here. Looks like there are two options, PAPERLESS_OCR_LANGUAGES and PAPERLESS_OCR_LANGUAGE. The first one determines which packages to install and the second one, which to use. The first option is missing in the paperless.conf file. May be we can add this manually and restart the app?

            WiseMetalheadW 1 Reply Last reply
            1
            • neurokrishN neurokrish

              @nebulon @WiseMetalhead there is some more information here. Looks like there are two options, PAPERLESS_OCR_LANGUAGES and PAPERLESS_OCR_LANGUAGE. The first one determines which packages to install and the second one, which to use. The first option is missing in the paperless.conf file. May be we can add this manually and restart the app?

              WiseMetalheadW Offline
              WiseMetalheadW Offline
              WiseMetalhead
              translator
              wrote on last edited by
              #5

              @neurokrish @nebulon I tried this method earlier, but it didn't work.

              My paperless.conf file:
              Снимок экрана 2022-01-17 203913.png

              Log:
              Снимок экрана 2022-01-17 203957.png

              neurokrishN 1 Reply Last reply
              1
              • WiseMetalheadW WiseMetalhead

                @neurokrish @nebulon I tried this method earlier, but it didn't work.

                My paperless.conf file:
                Снимок экрана 2022-01-17 203913.png

                Log:
                Снимок экрана 2022-01-17 203957.png

                neurokrishN Offline
                neurokrishN Offline
                neurokrish
                wrote on last edited by
                #6

                @wisemetalhead according to the docs, the PAPERLESS_OCR_LANGUAGES option should be configured in docker-compose.env and not in paperless.conf. Perhaps @nebulon can help here..

                nebulonN 1 Reply Last reply
                1
                • neurokrishN neurokrish

                  @wisemetalhead according to the docs, the PAPERLESS_OCR_LANGUAGES option should be configured in docker-compose.env and not in paperless.conf. Perhaps @nebulon can help here..

                  nebulonN Offline
                  nebulonN Offline
                  nebulon
                  Staff
                  wrote on last edited by
                  #7

                  @neurokrish I have to try this here myself, since the Cloudron app package has nothing to do with their upstream docker image, the default self-hosting config docs would apply instead, so https://paperless-ng.readthedocs.io/en/latest/configuration.html?highlight=languages#ocr-settings

                  neurokrishN 1 Reply Last reply
                  1
                  • nebulonN nebulon

                    @neurokrish I have to try this here myself, since the Cloudron app package has nothing to do with their upstream docker image, the default self-hosting config docs would apply instead, so https://paperless-ng.readthedocs.io/en/latest/configuration.html?highlight=languages#ocr-settings

                    neurokrishN Offline
                    neurokrishN Offline
                    neurokrish
                    wrote on last edited by neurokrish
                    #8

                    @nebulon then may be a standard apt-get install for additional language packs is the way to go?

                    May be we can install all OCR languages by default using sudo apt-get install tesseract-ocr-all as mentioned here. This way, the app has all languages installed by default and users can choose a specific language by modifying the PAPERLESS_OCR_LANGUAGE flag in paperless.conf.

                    nebulonN L 2 Replies Last reply
                    4
                    • neurokrishN neurokrish

                      @nebulon then may be a standard apt-get install for additional language packs is the way to go?

                      May be we can install all OCR languages by default using sudo apt-get install tesseract-ocr-all as mentioned here. This way, the app has all languages installed by default and users can choose a specific language by modifying the PAPERLESS_OCR_LANGUAGE flag in paperless.conf.

                      nebulonN Offline
                      nebulonN Offline
                      nebulon
                      Staff
                      wrote on last edited by
                      #9

                      @neurokrish thanks for the suggestion, at least it solved the issue for me with a deu+eng setting. The just updated package v0.7.0 has those changes.

                      neurokrishN 1 Reply Last reply
                      5
                      • nebulonN nebulon

                        @neurokrish thanks for the suggestion, at least it solved the issue for me with a deu+eng setting. The just updated package v0.7.0 has those changes.

                        neurokrishN Offline
                        neurokrishN Offline
                        neurokrish
                        wrote on last edited by
                        #10

                        @nebulon great! @WiseMetalhead, can you confirm that the latest update solves your issue with OCR?

                        WiseMetalheadW V 2 Replies Last reply
                        2
                        • neurokrishN neurokrish

                          @nebulon great! @WiseMetalhead, can you confirm that the latest update solves your issue with OCR?

                          WiseMetalheadW Offline
                          WiseMetalheadW Offline
                          WiseMetalhead
                          translator
                          wrote on last edited by
                          #11

                          @neurokrish @nebulon Now it works perfectly! Thanks for the help.

                          1 Reply Last reply
                          1
                          • neurokrishN neurokrish

                            @nebulon then may be a standard apt-get install for additional language packs is the way to go?

                            May be we can install all OCR languages by default using sudo apt-get install tesseract-ocr-all as mentioned here. This way, the app has all languages installed by default and users can choose a specific language by modifying the PAPERLESS_OCR_LANGUAGE flag in paperless.conf.

                            L Offline
                            L Offline
                            LoudLemur
                            wrote on last edited by
                            #12

                            @neurokrish tesseract has a Docker file and it would be nice to support Tesseract on Cloudron.

                            https://github.com/tesseract-ocr/tesseract

                            girishG 1 Reply Last reply
                            0
                            • L LoudLemur referenced this topic on
                            • neurokrishN neurokrish

                              @nebulon great! @WiseMetalhead, can you confirm that the latest update solves your issue with OCR?

                              V Offline
                              V Offline
                              vyshnavR
                              wrote on last edited by
                              #13

                              I have same issue i need to integrate tamil ocr(tam) i already installed tamil and done all steps as like said above still its not supported. it is throwing error like this when docker up
                              "?: The selected ocr language tam is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
                              "
                              @nebulon @neurokrish @WiseMetalhead @LoudLemur

                              girishG 1 Reply Last reply
                              0
                              • L LoudLemur

                                @neurokrish tesseract has a Docker file and it would be nice to support Tesseract on Cloudron.

                                https://github.com/tesseract-ocr/tesseract

                                girishG Offline
                                girishG Offline
                                girish
                                Staff
                                wrote on last edited by
                                #14

                                @LoudLemur tesseract is already installed in the package though.

                                1 Reply Last reply
                                0
                                • V vyshnavR

                                  I have same issue i need to integrate tamil ocr(tam) i already installed tamil and done all steps as like said above still its not supported. it is throwing error like this when docker up
                                  "?: The selected ocr language tam is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
                                  "
                                  @nebulon @neurokrish @WiseMetalhead @LoudLemur

                                  girishG Offline
                                  girishG Offline
                                  girish
                                  Staff
                                  wrote on last edited by
                                  #15

                                  @vyshnavR said in Multi Language OCR Support:

                                  I have same issue i need to integrate tamil ocr(tam) i already installed tamil and done all steps as like said above still its not supported. it is throwing error like this when docker up

                                  The thread here (and this forum) is about the Cloudron package of paperless-ngx . It looks like you are using docker installation. You have to take this up with the upstream project. Cloudron also uses docker but it does not use the upstream dockerfiles.

                                  1 Reply Last reply
                                  1
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                    • Login

                                    • Don't have an account? Register

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • Bookmarks
                                    • Search