<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t)]]></title><description><![CDATA[<p dir="auto">"If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."</p>
<p dir="auto"><a href="https://tabula.technology" target="_blank" rel="noopener noreferrer nofollow ugc">https://tabula.technology</a><br />
<a href="https://github.com/tabulapdf/tabula" target="_blank" rel="noopener noreferrer nofollow ugc">https://github.com/tabulapdf/tabula</a></p>
<p dir="auto">Tend to use this a lot for transcribing long PDF invoices.</p>
]]></description><link>https://forum.cloudron.io/topic/2814/tabula-extracts-table-data-from-pdfs-when-copy-paste-often-doesn-t</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 09:56:53 GMT</lastBuildDate><atom:link href="https://forum.cloudron.io/topic/2814.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 12 Jul 2020 23:53:39 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Tue, 04 Apr 2023 00:18:05 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/marcusquinn" aria-label="Profile: marcusquinn">@<bdi>marcusquinn</bdi></a></p>
<p dir="auto">By encouraging people to use Free Software, like LibreOffice, for their document creation, they will benefit from being able to export their final draft as a PDF with an embeded .odf for easy data extraction. It can also archive according to ISO / archiving standards, where needed.</p>
]]></description><link>https://forum.cloudron.io/post/64293</link><guid isPermaLink="true">https://forum.cloudron.io/post/64293</guid><dc:creator><![CDATA[LoudLemur]]></dc:creator><pubDate>Tue, 04 Apr 2023 00:18:05 GMT</pubDate></item><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Mon, 03 Apr 2023 21:33:11 GMT]]></title><description><![CDATA[<p dir="auto">Python wrapper, too: <a href="https://github.com/chezou/tabula-py" target="_blank" rel="noopener noreferrer nofollow ugc">https://github.com/chezou/tabula-py</a></p>
]]></description><link>https://forum.cloudron.io/post/64287</link><guid isPermaLink="true">https://forum.cloudron.io/post/64287</guid><dc:creator><![CDATA[marcusquinn]]></dc:creator><pubDate>Mon, 03 Apr 2023 21:33:11 GMT</pubDate></item><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Mon, 03 Apr 2023 21:30:17 GMT]]></title><description><![CDATA[<p dir="auto">Might seem unmaintained, but still works well, and remains the only open-source option for that that I know of.</p>
<p dir="auto">Becoming more important as a library to use in other LLM data analysis needs.</p>
<p dir="auto">Dockerised, too, should be relatively simple:</p>
<ul>
<li><a href="https://twitter.com/turicas/status/1569015173117280258" target="_blank" rel="noopener noreferrer nofollow ugc">https://twitter.com/turicas/status/1569015173117280258</a></li>
<li><a href="https://hub.docker.com/r/turicas/tabula" target="_blank" rel="noopener noreferrer nofollow ugc">https://hub.docker.com/r/turicas/tabula</a></li>
</ul>
]]></description><link>https://forum.cloudron.io/post/64286</link><guid isPermaLink="true">https://forum.cloudron.io/post/64286</guid><dc:creator><![CDATA[marcusquinn]]></dc:creator><pubDate>Mon, 03 Apr 2023 21:30:17 GMT</pubDate></item><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Mon, 28 Jun 2021 11:25:40 GMT]]></title><description><![CDATA[<p dir="auto">Revisiting this, the app runs on a localhost web server, hence could be a useful additional utility for teams to have access to at <a href="http://tabula.example.com" target="_blank" rel="noopener noreferrer nofollow ugc">tabula.example.com</a>.</p>
]]></description><link>https://forum.cloudron.io/post/33171</link><guid isPermaLink="true">https://forum.cloudron.io/post/33171</guid><dc:creator><![CDATA[marcusquinn]]></dc:creator><pubDate>Mon, 28 Jun 2021 11:25:40 GMT</pubDate></item><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Wed, 15 Jul 2020 13:23:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/jdaviescoates" aria-label="Profile: jdaviescoates">@<bdi>jdaviescoates</bdi></a> Ahh, I thought there was a web app/service version. Prob needs moving to Discuss then if mods can?</p>
<p dir="auto">Also for interest, it's pretty easy to send a scanned/image PDF to <a href="https://cloud.google.com/vision" target="_blank" rel="noopener noreferrer nofollow ugc">Google Vision</a> using <a href="https://integromat.com" target="_blank" rel="noopener noreferrer nofollow ugc">Integromat</a> to OCR and extract text.</p>
]]></description><link>https://forum.cloudron.io/post/10720</link><guid isPermaLink="true">https://forum.cloudron.io/post/10720</guid><dc:creator><![CDATA[marcusquinn]]></dc:creator><pubDate>Wed, 15 Jul 2020 13:23:40 GMT</pubDate></item><item><title><![CDATA[Reply to Tabula - extracts table data from PDFs (when copy-paste often doesn&#x27;t) on Wed, 15 Jul 2020 13:15:15 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/marcusquinn" aria-label="Profile: marcusquinn">@<bdi>marcusquinn</bdi></a> sounds like a useful tool, but appears to just be a desktop app and not a web app? So not sure how relevant it is to Cloudron...</p>
]]></description><link>https://forum.cloudron.io/post/10719</link><guid isPermaLink="true">https://forum.cloudron.io/post/10719</guid><dc:creator><![CDATA[jdaviescoates]]></dc:creator><pubDate>Wed, 15 Jul 2020 13:15:15 GMT</pubDate></item></channel></rss>