Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Paperless-ngx
  3. Add Apache Tika to Paperless-ngx package.

Add Apache Tika to Paperless-ngx package.

Scheduled Pinned Locked Moved Solved Paperless-ngx
9 Posts 4 Posters 1.7k Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jarod
    wrote on last edited by Jarod
    #1

    To process office and eml files, paperless needs a Apache tika server. IT can be hosted with docker and needs to be added in the paperless config. That's it.

    Could you please do that?

    https://docs.paperless-ngx.com/configuration/#tika

    1 Reply Last reply
    4
    • luckowL Online
      luckowL Online
      luckow
      translator
      wrote on last edited by
      #2

      And Gotenberg too. πŸ™‚
      https://docs.paperless-ngx.com/configuration/#optional-services

      Pronouns: he/him | Primary language: German

      1 Reply Last reply
      2
      • nebulonN Offline
        nebulonN Offline
        nebulon
        Staff
        wrote on last edited by
        #3

        We have created an internal task to look into this.

        1 Reply Last reply
        2
        • nebulonN nebulon marked this topic as a question on
        • nebulonN Offline
          nebulonN Offline
          nebulon
          Staff
          wrote on last edited by
          #4

          New app package contains now tika and gotenberg

          J scookeS 2 Replies Last reply
          6
          • nebulonN nebulon has marked this topic as solved on
          • nebulonN nebulon

            New app package contains now tika and gotenberg

            J Offline
            J Offline
            Jarod
            wrote on last edited by Jarod
            #5

            @nebulon
            I get the following error, when I try to add a eml file

            [2024-11-03 17:53:03,119] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error occurred while consuming document WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error while converting email to PDF: [Errno 111] Connection refused
            Traceback (most recent call last):
              File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
                yield
              File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 236, in handle_request
                resp = self._pool.handle_request(req)
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 216, in handle_request
                raise exc from None
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 196, in handle_request
                response = connection.handle_request(
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 99, in handle_request
                raise exc
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 76, in handle_request
                stream = self._connect(request)
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 122, in _connect
                stream = self._network_backend.connect_tcp(**kwargs)
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 205, in connect_tcp
                with map_exceptions(exc_map):
              File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
                self.gen.throw(typ, value, traceback)
              File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
                raise to_exc(exc) from exc
            httpcore.ConnectError: [Errno 111] Connection refused
            The above exception was the direct cause of the following exception:
            Traceback (most recent call last):
              File "/app/code/src/paperless_mail/parsers.py", line 354, in generate_pdf_from_mail
                .run()
              File "/usr/local/lib/python3.10/dist-packages/gotenberg_client/_base.py", line 113, in run
                resp = self._client.post(url=self._route, headers=self._headers, data=self._form_data, files=self._get_files())
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1157, in post
                return self.request(
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 837, in request
                return self.send(request, auth=auth, follow_redirects=follow_redirects)
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 926, in send
                response = self._send_handling_auth(
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 954, in _send_handling_auth
                response = self._send_handling_redirects(
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 991, in _send_handling_redirects
                response = self._send_single_request(request)
              File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1027, in _send_single_request
                response = transport.handle_request(request)
              File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 235, in handle_request
                with map_httpcore_exceptions():
              File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
                self.gen.throw(typ, value, traceback)
              File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
                raise mapped_exc(message) from exc
            httpx.ConnectError: [Errno 111] Connection refused
            The above exception was the direct cause of the following exception:
            Traceback (most recent call last):
              File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 327, in main_wrap
                raise exc_info[1]
              File "/app/code/src/documents/consumer.py", line 476, in run
                document_parser.parse(self.working_copy, mime_type, self.filename)
              File "/app/code/src/paperless_mail/parsers.py", line 183, in parse
                self.archive_path = self.generate_pdf(mail)
              File "/app/code/src/paperless_mail/parsers.py", line 223, in generate_pdf
                mail_pdf_file = self.generate_pdf_from_mail(mail_message)
              File "/app/code/src/paperless_mail/parsers.py", line 357, in generate_pdf_from_mail
                raise ParseError(
            documents.parsers.ParseError: Error while converting email to PDF: [Errno 111] Connection refused
            The above exception was the direct cause of the following exception:
            Traceback (most recent call last):
              File "/app/code/src/documents/tasks.py", line 148, in consume_file
                msg = plugin.run()
              File "/app/code/src/documents/consumer.py", line 508, in run
                self._fail(
              File "/app/code/src/documents/consumer.py", line 151, in _fail
                raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
            documents.consumer.ConsumerError: WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error occurred while consuming document WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error while converting email to PDF: [Errno 111] Connection refused
            
            luckowL 1 Reply Last reply
            0
            • J Jarod

              @nebulon
              I get the following error, when I try to add a eml file

              [2024-11-03 17:53:03,119] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error occurred while consuming document WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error while converting email to PDF: [Errno 111] Connection refused
              Traceback (most recent call last):
                File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
                  yield
                File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 236, in handle_request
                  resp = self._pool.handle_request(req)
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 216, in handle_request
                  raise exc from None
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 196, in handle_request
                  response = connection.handle_request(
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 99, in handle_request
                  raise exc
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 76, in handle_request
                  stream = self._connect(request)
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 122, in _connect
                  stream = self._network_backend.connect_tcp(**kwargs)
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 205, in connect_tcp
                  with map_exceptions(exc_map):
                File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
                  self.gen.throw(typ, value, traceback)
                File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
                  raise to_exc(exc) from exc
              httpcore.ConnectError: [Errno 111] Connection refused
              The above exception was the direct cause of the following exception:
              Traceback (most recent call last):
                File "/app/code/src/paperless_mail/parsers.py", line 354, in generate_pdf_from_mail
                  .run()
                File "/usr/local/lib/python3.10/dist-packages/gotenberg_client/_base.py", line 113, in run
                  resp = self._client.post(url=self._route, headers=self._headers, data=self._form_data, files=self._get_files())
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1157, in post
                  return self.request(
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 837, in request
                  return self.send(request, auth=auth, follow_redirects=follow_redirects)
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 926, in send
                  response = self._send_handling_auth(
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 954, in _send_handling_auth
                  response = self._send_handling_redirects(
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 991, in _send_handling_redirects
                  response = self._send_single_request(request)
                File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1027, in _send_single_request
                  response = transport.handle_request(request)
                File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 235, in handle_request
                  with map_httpcore_exceptions():
                File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
                  self.gen.throw(typ, value, traceback)
                File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
                  raise mapped_exc(message) from exc
              httpx.ConnectError: [Errno 111] Connection refused
              The above exception was the direct cause of the following exception:
              Traceback (most recent call last):
                File "/usr/local/lib/python3.10/dist-packages/asgiref/sync.py", line 327, in main_wrap
                  raise exc_info[1]
                File "/app/code/src/documents/consumer.py", line 476, in run
                  document_parser.parse(self.working_copy, mime_type, self.filename)
                File "/app/code/src/paperless_mail/parsers.py", line 183, in parse
                  self.archive_path = self.generate_pdf(mail)
                File "/app/code/src/paperless_mail/parsers.py", line 223, in generate_pdf
                  mail_pdf_file = self.generate_pdf_from_mail(mail_message)
                File "/app/code/src/paperless_mail/parsers.py", line 357, in generate_pdf_from_mail
                  raise ParseError(
              documents.parsers.ParseError: Error while converting email to PDF: [Errno 111] Connection refused
              The above exception was the direct cause of the following exception:
              Traceback (most recent call last):
                File "/app/code/src/documents/tasks.py", line 148, in consume_file
                  msg = plugin.run()
                File "/app/code/src/documents/consumer.py", line 508, in run
                  self._fail(
                File "/app/code/src/documents/consumer.py", line 151, in _fail
                  raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
              documents.consumer.ConsumerError: WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error occurred while consuming document WG_ aktuelle Zahlen aus dem Fachdienst 35.eml: Error while converting email to PDF: [Errno 111] Connection refused
              
              luckowL Online
              luckowL Online
              luckow
              translator
              wrote on last edited by
              #6

              @Jarod my test worked
              92dd6d05-44c5-4d4b-a351-0256529d6577-image.png

              Testcase: Save eml file to desktop. Open paperless ngx. Upload eml file.

              To be fair: the result does not meet my expectations. πŸ™‚
              47c257e3-8b36-4742-b51e-76ac54920b3d-image.png

              Without reading the documents, my expectation was a pdf.

              Pronouns: he/him | Primary language: German

              J 1 Reply Last reply
              0
              • luckowL luckow

                @Jarod my test worked
                92dd6d05-44c5-4d4b-a351-0256529d6577-image.png

                Testcase: Save eml file to desktop. Open paperless ngx. Upload eml file.

                To be fair: the result does not meet my expectations. πŸ™‚
                47c257e3-8b36-4742-b51e-76ac54920b3d-image.png

                Without reading the documents, my expectation was a pdf.

                J Offline
                J Offline
                Jarod
                wrote on last edited by
                #7

                @luckow said in Add Apache Tika to Paperless-ngx package.:

                Testcase: Save eml file to desktop. Open paperless ngx. Upload eml file.

                That's how I done it.

                luckowL 1 Reply Last reply
                0
                • J Jarod

                  @luckow said in Add Apache Tika to Paperless-ngx package.:

                  Testcase: Save eml file to desktop. Open paperless ngx. Upload eml file.

                  That's how I done it.

                  luckowL Online
                  luckowL Online
                  luckow
                  translator
                  wrote on last edited by
                  #8

                  @Jarod my best guess: we need a lot more samples to get to the bottom of this. My β€œproblem”: I use Thunderbird and don't normally deal with eml files. But I will try it out next week. Let's use this thread to record our findings.

                  Pronouns: he/him | Primary language: German

                  1 Reply Last reply
                  1
                  • nebulonN nebulon forked this topic on
                  • nebulonN nebulon

                    New app package contains now tika and gotenberg

                    scookeS Offline
                    scookeS Offline
                    scooke
                    wrote on last edited by
                    #9

                    @nebulon This is super Nebulon. There are so many apps Cloudron has made available. But adding "small" requests like this to existing apps makes Cloudron all that more appealing. You've just saved me the cost of a separate VPS on which I had been running Tika and Gotenburg! Has there been a better improvement of any app than this?! Thank you!

                    PAPERLESS_TIKA_ENABLED=true
                    PAPERLESS_TIKA_ENDPOINT=http://localhost:9998
                    PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://localhost:3000

                    A life lived in fear is a life half-lived

                    1 Reply Last reply
                    2
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Bookmarks
                    • Search