Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. App Wishlist
  3. Tabula - extracts table data from PDFs (when copy-paste often doesn't)

Tabula - extracts table data from PDFs (when copy-paste often doesn't)

Scheduled Pinned Locked Moved App Wishlist
7 Posts 3 Posters 1.4k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • marcusquinnM Offline
    marcusquinnM Offline
    marcusquinn
    wrote on last edited by marcusquinn
    #1

    "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

    https://tabula.technology
    https://github.com/tabulapdf/tabula

    Tend to use this a lot for transcribing long PDF invoices.

    Web Design https://www.evergreen.je
    Development https://brandlight.org
    Life https://marcusquinn.com

    jdaviescoatesJ L 2 Replies Last reply
    1
    • marcusquinnM marcusquinn

      "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

      https://tabula.technology
      https://github.com/tabulapdf/tabula

      Tend to use this a lot for transcribing long PDF invoices.

      jdaviescoatesJ Online
      jdaviescoatesJ Online
      jdaviescoates
      wrote on last edited by
      #2

      @marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...

      I use Cloudron with Gandi & Hetzner

      marcusquinnM 1 Reply Last reply
      0
      • jdaviescoatesJ jdaviescoates

        @marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...

        marcusquinnM Offline
        marcusquinnM Offline
        marcusquinn
        wrote on last edited by marcusquinn
        #3

        @jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?

        Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.

        Web Design https://www.evergreen.je
        Development https://brandlight.org
        Life https://marcusquinn.com

        marcusquinnM 1 Reply Last reply
        1
        • marcusquinnM marcusquinn

          @jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?

          Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.

          marcusquinnM Offline
          marcusquinnM Offline
          marcusquinn
          wrote on last edited by
          #4

          Revisiting this, the app runs on a localhost web server, hence could be a useful additional utility for teams to have access to at tabula.example.com.

          Web Design https://www.evergreen.je
          Development https://brandlight.org
          Life https://marcusquinn.com

          1 Reply Last reply
          1
          • marcusquinnM Offline
            marcusquinnM Offline
            marcusquinn
            wrote on last edited by
            #5

            Might seem unmaintained, but still works well, and remains the only open-source option for that that I know of.

            Becoming more important as a library to use in other LLM data analysis needs.

            Dockerised, too, should be relatively simple:

            • https://twitter.com/turicas/status/1569015173117280258
            • https://hub.docker.com/r/turicas/tabula

            Web Design https://www.evergreen.je
            Development https://brandlight.org
            Life https://marcusquinn.com

            1 Reply Last reply
            2
            • marcusquinnM Offline
              marcusquinnM Offline
              marcusquinn
              wrote on last edited by
              #6

              Python wrapper, too: https://github.com/chezou/tabula-py

              Web Design https://www.evergreen.je
              Development https://brandlight.org
              Life https://marcusquinn.com

              1 Reply Last reply
              2
              • marcusquinnM marcusquinn

                "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

                https://tabula.technology
                https://github.com/tabulapdf/tabula

                Tend to use this a lot for transcribing long PDF invoices.

                L Offline
                L Offline
                LoudLemur
                wrote on last edited by
                #7

                @marcusquinn

                By encouraging people to use Free Software, like LibreOffice, for their document creation, they will benefit from being able to export their final draft as a PDF with an embeded .odf for easy data extraction. It can also archive according to ISO / archiving standards, where needed.

                1 Reply Last reply
                1
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Bookmarks
                • Search