Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. indexing of office documents?

indexing of office documents?

Scheduled Pinned Locked Moved Unsolved Paperless-ngx
21 Posts 9 Posters 3.7k Views 8 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ChristopherMagC ChristopherMag

    For anyone wanting to get this up and running quickly if you have docker running on another system you can run the following:

    docker run -d --restart unless-stopped -p 3000:3000 gotenberg/gotenberg
    docker run -d --restart unless-stopped -p 9998:9998 apache/tika
    

    And then add the following to your paperless.conf:

    # Tika
    PAPERLESS_TIKA_ENABLED=true
    PAPERLESS_TIKA_ENDPOINT=http://<DockerHostnameOrIPGoesHere>:9998
    PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://<DockerHostnameOrIPGoesHere>:3000
    

    After this you can upload xlsx, docx, etc. to paperless-ngx.

    In my testing if the Docker host running the Tika and Gotenberg containers goes down paperless-ngx keeps working fine but you won't be able to upload additional xlsx/docx/etc. documents until you restart the containers which works out fine as the reliability of paperless-ngx being accessible is way more important than this one feature working for us.

    timconsidineT Offline
    timconsidineT Offline
    timconsidine
    App Dev
    wrote on last edited by
    #11

    @ChristopherMag do you or anyone else have experience of using this kind of setup for Mac OS document formats like Pages and Numbers ?

    ChristopherMagC 1 Reply Last reply
    0
    • timconsidineT timconsidine

      @ChristopherMag do you or anyone else have experience of using this kind of setup for Mac OS document formats like Pages and Numbers ?

      ChristopherMagC Offline
      ChristopherMagC Offline
      ChristopherMag
      wrote on last edited by ChristopherMag
      #12

      @timconsidine It looks like Apache Tika supports the document formats from the iWork suite like pages.

      I tried to upload a .pages file to paperless-ngx with Tika and gotenberg configured and paperless popped up a failure message with the the error File type application/zip not supported.

      I believe this signals.py file in the paperless-ngx project would need to add support for the various iWork software suite formats to resolve this error and get this working assuming you already have Tika and gotenberg setup and working with paperless-ngx.

      You could probably open a github issue in the paperless-ngx repository on Github and see if they can assist with adding support fort his.

      1 Reply Last reply
      2
      • necrevistonnezrN Online
        necrevistonnezrN Online
        necrevistonnezr
        wrote on last edited by
        #13

        Any updates on this?

        girishG 1 Reply Last reply
        1
        • necrevistonnezrN necrevistonnezr

          Any updates on this?

          girishG Offline
          girishG Offline
          girish
          Staff
          wrote on last edited by
          #14

          @necrevistonnezr nothing yet...

          1 Reply Last reply
          0
          • neurokrishN Offline
            neurokrishN Offline
            neurokrish
            wrote on last edited by neurokrish
            #15

            Hi, is there an update on this? I tried @ChristopherMag 's suggestion. However, I get connection refused for Tika. Is this something to do with iptables? How can I allow connection to the container for paperless app to access?

            EDIT: I must say that I have installed docker - tika and gotenberg in the same system as Cloudron.

            1 Reply Last reply
            0
            • nebulonN Offline
              nebulonN Offline
              nebulon
              Staff
              wrote on last edited by
              #16

              I don't think we have an update on this yet. Possibly your containers are not within the same docker network on the system? Either way adding docker container on the side of Cloudron will break on Cloudron updates, so this is not very useful to investigate as such. Have you instead tried to run the required services on a separate isolated server instead?

              1 Reply Last reply
              0
              • neurokrishN Offline
                neurokrishN Offline
                neurokrish
                wrote on last edited by
                #17

                @nebulon , thanks for your reply. Tried both ways, containers outside and inside Cloudron network. Good to know doing the later will break updates. Removed those containers now. Is it difficult to pre-install these containers via the app itself? Alternatively, may be provide them as separate installations as separate Cloudron apps which can be linked to paperless?

                1 Reply Last reply
                0
                • nebulonN Offline
                  nebulonN Offline
                  nebulon
                  Staff
                  wrote on last edited by
                  #18

                  Unless Tika and Gotenburg are useful for other apps, it may make more sense to actually package them as part of paperless and pre-configure everything.

                  Does anyone have experience on the memory requirement for those?

                  1 Reply Last reply
                  1
                  • ChristopherMagC Offline
                    ChristopherMagC Offline
                    ChristopherMag
                    wrote on last edited by
                    #19

                    Fyi gotenberg publishes cloudron specific images now. Not sure the history of how or why that was started but I would assume those are meant to be used as a cloudron app though I don't see any app in the app store for it.

                    PS, don't use these images for your own gotenberg instance that your integrating with paperless, they exist hopefully to make it easier one day to run gotenberg on cloudron directly.

                    necrevistonnezrN 1 Reply Last reply
                    0
                    • ChristopherMagC ChristopherMag

                      Fyi gotenberg publishes cloudron specific images now. Not sure the history of how or why that was started but I would assume those are meant to be used as a cloudron app though I don't see any app in the app store for it.

                      PS, don't use these images for your own gotenberg instance that your integrating with paperless, they exist hopefully to make it easier one day to run gotenberg on cloudron directly.

                      necrevistonnezrN Online
                      necrevistonnezrN Online
                      necrevistonnezr
                      wrote on last edited by
                      #20

                      @ChristopherMag It says ‘cloudrun’ - sure it’s just a typo or does it mean something like ‘cloud-run’?

                      1 Reply Last reply
                      0
                      • ChristopherMagC Offline
                        ChristopherMagC Offline
                        ChristopherMag
                        wrote on last edited by
                        #21

                        @necrevistonnezr Wow, your right, those images probably have nothing to do with cloudron!

                        Thanks for pointing that out, for all others, please disregard my previous comment.

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Bookmarks
                        • Search