Tabula - extracts table data from PDFs (when copy-paste often doesn't)
-
"If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."
https://tabula.technology
https://github.com/tabulapdf/tabulaTend to use this a lot for transcribing long PDF invoices.
-
@marcusquinn sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...
-
@jdaviescoates Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?
Also for interest, it's pretty easy to send a scanned/image PDF to Google Vision using Integromat to OCR and extract text.
-
Revisiting this, the app runs on a localhost web server, hence could be a useful additional utility for teams to have access to at tabula.example.com.