Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Cloudron Forum

Apps | Demo | Docs | Install

Tabula - extracts table data from PDFs (when copy-paste often doesn't)

Scheduled Pinned Locked Moved App Wishlist
7 Posts 3 Posters 358 Views
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • marcusquinnM Online
    marcusquinnM Online
    marcusquinn
    wrote on last edited by marcusquinn
    #1

    "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."

    https://tabula.technology
    https://github.com/tabulapdf/tabula

    Tend to use this a lot for transcribing long PDF invoices.

    We're not here for a long time - but we are here for a good time :)
    Jersey/UK
    Work & Ecommerce Advice: https://brandlight.org
    Personal & Software Tips: https://marcusquinn.com

    jdaviescoatesJ L 2 Replies Last reply
    1
  • jdaviescoatesJ Offline
    jdaviescoatesJ Offline
    jdaviescoates
    replied to marcusquinn on last edited by
    #2

    @marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...

    I use Cloudron with Gandi & Hetzner

    marcusquinnM 1 Reply Last reply
    0
  • marcusquinnM Online
    marcusquinnM Online
    marcusquinn
    replied to jdaviescoates on last edited by marcusquinn
    #3

    @jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?

    Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.

    We're not here for a long time - but we are here for a good time :)
    Jersey/UK
    Work & Ecommerce Advice: https://brandlight.org
    Personal & Software Tips: https://marcusquinn.com

    marcusquinnM 1 Reply Last reply
    1
  • marcusquinnM Online
    marcusquinnM Online
    marcusquinn
    replied to marcusquinn on last edited by
    #4

    Revisiting this, the app runs on a localhost web server, hence could be a useful additional utility for teams to have access to at tabula.example.com.

    We're not here for a long time - but we are here for a good time :)
    Jersey/UK
    Work & Ecommerce Advice: https://brandlight.org
    Personal & Software Tips: https://marcusquinn.com

    1 Reply Last reply
    1
  • marcusquinnM Online
    marcusquinnM Online
    marcusquinn
    wrote on last edited by
    #5

    Might seem unmaintained, but still works well, and remains the only open-source option for that that I know of.

    Becoming more important as a library to use in other LLM data analysis needs.

    Dockerised, too, should be relatively simple:

    • https://twitter.com/turicas/status/1569015173117280258
    • https://hub.docker.com/r/turicas/tabula

    We're not here for a long time - but we are here for a good time :)
    Jersey/UK
    Work & Ecommerce Advice: https://brandlight.org
    Personal & Software Tips: https://marcusquinn.com

    1 Reply Last reply
    2
  • marcusquinnM Online
    marcusquinnM Online
    marcusquinn
    wrote on last edited by
    #6

    Python wrapper, too: https://github.com/chezou/tabula-py

    We're not here for a long time - but we are here for a good time :)
    Jersey/UK
    Work & Ecommerce Advice: https://brandlight.org
    Personal & Software Tips: https://marcusquinn.com

    1 Reply Last reply
    2
  • L Offline
    L Offline
    LoudLemur
    replied to marcusquinn on last edited by
    #7

    @marcusquinn

    By encouraging people to use Free Software, like LibreOffice, for their document creation, they will benefit from being able to export their final draft as a PDF with an embeded .odf for easy data extraction. It can also archive according to ISO / archiving standards, where needed.

    1 Reply Last reply
    1

  • Login

  • Don't have an account? Register

  • Login or register to search.
  • First post
    Last post
0
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks