Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps - Status | Demo | Docs | Install
  1. Cloudron Forum
  2. App Wishlist
  3. Tabula - extracts table data from PDFs (when copy-paste often doesn't)

Tabula - extracts table data from PDFs (when copy-paste often doesn't)

Scheduled Pinned Locked Moved App Wishlist
7 Posts 3 Posters 3.0k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • marcusquinnM Offline
    marcusquinnM Offline
    marcusquinn
    wrote on last edited by marcusquinn
    #1

    "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

    https://tabula.technology
    https://github.com/tabulapdf/tabula

    Tend to use this a lot for transcribing long PDF invoices.

    Web Design & Development: https://www.evergreen.je
    Technology & Apps: https://www.marcusquinn.com

    jdaviescoatesJ L 2 Replies Last reply
    1
    • marcusquinnM marcusquinn

      "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

      https://tabula.technology
      https://github.com/tabulapdf/tabula

      Tend to use this a lot for transcribing long PDF invoices.

      jdaviescoatesJ Offline
      jdaviescoatesJ Offline
      jdaviescoates
      wrote on last edited by
      #2

      @marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...

      I use Cloudron with Gandi & Hetzner

      marcusquinnM 1 Reply Last reply
      0
      • jdaviescoatesJ jdaviescoates

        @marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...

        marcusquinnM Offline
        marcusquinnM Offline
        marcusquinn
        wrote on last edited by marcusquinn
        #3

        @jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?

        Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.

        Web Design & Development: https://www.evergreen.je
        Technology & Apps: https://www.marcusquinn.com

        marcusquinnM 1 Reply Last reply
        1
        • marcusquinnM marcusquinn

          @jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?

          Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.

          marcusquinnM Offline
          marcusquinnM Offline
          marcusquinn
          wrote on last edited by
          #4

          Revisiting this, the app runs on a localhost web server, hence could be a useful additional utility for teams to have access to at tabula.example.com.

          Web Design & Development: https://www.evergreen.je
          Technology & Apps: https://www.marcusquinn.com

          1 Reply Last reply
          1
          • marcusquinnM Offline
            marcusquinnM Offline
            marcusquinn
            wrote on last edited by
            #5

            Might seem unmaintained, but still works well, and remains the only open-source option for that that I know of.

            Becoming more important as a library to use in other LLM data analysis needs.

            Dockerised, too, should be relatively simple:

            • https://twitter.com/turicas/status/1569015173117280258
            • https://hub.docker.com/r/turicas/tabula

            Web Design & Development: https://www.evergreen.je
            Technology & Apps: https://www.marcusquinn.com

            1 Reply Last reply
            2
            • marcusquinnM Offline
              marcusquinnM Offline
              marcusquinn
              wrote on last edited by
              #6

              Python wrapper, too: https://github.com/chezou/tabula-py

              Web Design & Development: https://www.evergreen.je
              Technology & Apps: https://www.marcusquinn.com

              1 Reply Last reply
              2
              • marcusquinnM marcusquinn

                "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

                https://tabula.technology
                https://github.com/tabulapdf/tabula

                Tend to use this a lot for transcribing long PDF invoices.

                L Offline
                L Offline
                LoudLemur
                wrote on last edited by
                #7

                @marcusquinn

                By encouraging people to use Free Software, like LibreOffice, for their document creation, they will benefit from being able to export their final draft as a PDF with an embeded .odf for easy data extraction. It can also archive according to ISO / archiving standards, where needed.

                1 Reply Last reply
                1

                Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                With your input, this post could be even better 💗

                Register Login
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Bookmarks
                • Search