pd3f - Open-source PDF text extraction using machine learning
-
Their docker is so nice, I was able to get it up and running on a bare metal server in five minutes. No joke.
Check out their demo site
"pd3f is an Open-source PDF text extraction pipeline that is self-hosted, local-first and Docker-based.
pd3f reconstructs the original continuous text with the help of machine learning."
-
@turian what do you think. Is it possible to integrate it into workflows like ... upload a PDF into paperless, then into pd3f, back to paperless? Or similar with nextcloud or cubby? Upload and copy & paste from pdf3 into other apps feels wrong