Mercury Parser: extract content from URLs (e.g. for RSS aggregators)
-
Overview: https://medium.com/@adampash/the-secret-engines-of-the-internet-e517592266ea
Code: https://github.com/postlight/mercury-parserMercury Parser allows to pull fulltext content from URLs- It's the engine used in Reeder, NewsBlur, Feedbin, News Explorer, Feedly, Apollo (for Reddit), Medium, Bear, Zapier, etc.
It would allow to self host the service for the Cloudron apps FreshRSS and Tiny Tiny RSS (and maybe even more that I don't think of right now?!)
Postlight's Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.
Mercury Parser powers the Mercury AMP Converter and Mercury Reader, a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.
Mercury Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are many examples available along with documentation.
- Chrome extension: https://mercury.postlight.com/reader/
- FreshRSS plugin: https://github.com/simon-wessel/freshrss-mercury-parser
- Tiny Tiny RSS Plugin: https://github.com/HenryQW/mercury_fulltext
-
Now called „Postlight Parser and under active development: https://github.com/postlight/parser