Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


    Cloudron Forum

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular

    Solved Multi Language OCR Support

    Paperless-ngx
    4
    12
    1117
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • WiseMetalhead
      WiseMetalhead translator last edited by

      Is it possible to enable multi-language OCR support for paperless-ng?

      As far as I understand, docker has variables for this. And I also found this modification.

      neurokrish 1 Reply Last reply Reply Quote 1
      • neurokrish
        neurokrish @WiseMetalhead last edited by

        @wisemetalhead according to this, we can enable additional languages under PAPERLESS_OCR_LANGUAGE in OCR settings options. This can be found in the paperless.conf file when you click the file manager icon of the paperless-ng app's settings in Cloudron.

        Capture.PNG

        c258e287-d7cd-4d09-8572-edddc08fc76f-image.png

        1 Reply Last reply Reply Quote 1
        • nebulon
          nebulon Staff last edited by

          Did the provided suggestion from @neurokrish solve the issue?

          neurokrish 1 Reply Last reply Reply Quote 2
          • neurokrish
            neurokrish @nebulon last edited by

            @nebulon @WiseMetalhead there is some more information here. Looks like there are two options, PAPERLESS_OCR_LANGUAGES and PAPERLESS_OCR_LANGUAGE. The first one determines which packages to install and the second one, which to use. The first option is missing in the paperless.conf file. May be we can add this manually and restart the app?

            WiseMetalhead 1 Reply Last reply Reply Quote 1
            • WiseMetalhead
              WiseMetalhead translator @neurokrish last edited by

              @neurokrish @nebulon I tried this method earlier, but it didn't work.

              My paperless.conf file:
              Снимок экрана 2022-01-17 203913.png

              Log:
              Снимок экрана 2022-01-17 203957.png

              neurokrish 1 Reply Last reply Reply Quote 1
              • neurokrish
                neurokrish @WiseMetalhead last edited by

                @wisemetalhead according to the docs, the PAPERLESS_OCR_LANGUAGES option should be configured in docker-compose.env and not in paperless.conf. Perhaps @nebulon can help here..

                nebulon 1 Reply Last reply Reply Quote 1
                • nebulon
                  nebulon Staff @neurokrish last edited by

                  @neurokrish I have to try this here myself, since the Cloudron app package has nothing to do with their upstream docker image, the default self-hosting config docs would apply instead, so https://paperless-ng.readthedocs.io/en/latest/configuration.html?highlight=languages#ocr-settings

                  neurokrish 1 Reply Last reply Reply Quote 1
                  • neurokrish
                    neurokrish @nebulon last edited by neurokrish

                    @nebulon then may be a standard apt-get install for additional language packs is the way to go?

                    May be we can install all OCR languages by default using sudo apt-get install tesseract-ocr-all as mentioned here. This way, the app has all languages installed by default and users can choose a specific language by modifying the PAPERLESS_OCR_LANGUAGE flag in paperless.conf.

                    nebulon L 2 Replies Last reply Reply Quote 4
                    • nebulon
                      nebulon Staff @neurokrish last edited by

                      @neurokrish thanks for the suggestion, at least it solved the issue for me with a deu+eng setting. The just updated package v0.7.0 has those changes.

                      neurokrish 1 Reply Last reply Reply Quote 5
                      • neurokrish
                        neurokrish @nebulon last edited by

                        @nebulon great! @WiseMetalhead, can you confirm that the latest update solves your issue with OCR?

                        WiseMetalhead 1 Reply Last reply Reply Quote 2
                        • WiseMetalhead
                          WiseMetalhead translator @neurokrish last edited by

                          @neurokrish @nebulon Now it works perfectly! Thanks for the help.

                          1 Reply Last reply Reply Quote 1
                          • L
                            LoudLemur @neurokrish last edited by

                            @neurokrish tesseract has a Docker file and it would be nice to support Tesseract on Cloudron.

                            https://github.com/tesseract-ocr/tesseract

                            1 Reply Last reply Reply Quote 0
                            • Referenced by  L LoudLemur 
                            • First post
                              Last post
                            Powered by NodeBB