pd3f - Open-source PDF text extraction using machine learning
-
Their docker is so nice, I was able to get it up and running on a bare metal server in five minutes. No joke.
Check out their demo site
"pd3f is an Open-source PDF text extraction pipeline that is self-hosted, local-first and Docker-based.
pd3f reconstructs the original continuous text with the help of machine learning."
-
Their docker is so nice, I was able to get it up and running on a bare metal server in five minutes. No joke.
Check out their demo site
"pd3f is an Open-source PDF text extraction pipeline that is self-hosted, local-first and Docker-based.
pd3f reconstructs the original continuous text with the help of machine learning."
@turian what do you think. Is it possible to integrate it into workflows like ... upload a PDF into paperless, then into pd3f, back to paperless? Or similar with nextcloud or cubby? Upload and copy & paste from pdf3 into other apps feels wrong

Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login