Mercury Parser: extract content from URLs (e.g. for RSS aggregators)

necrevistonnezr

Overview: https://medium.com/@adampash/the-secret-engines-of-the-internet-e517592266ea
Code: https://github.com/postlight/mercury-parser

Mercury Parser allows to pull fulltext content from URLs- It's the engine used in Reeder, NewsBlur, Feedbin, News Explorer, Feedly, Apollo (for Reddit), Medium, Bear, Zapier, etc.

It would allow to self host the service for the Cloudron apps FreshRSS and Tiny Tiny RSS (and maybe even more that I don't think of right now?!)

Postlight's Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.

Mercury Parser powers the Mercury AMP Converter and Mercury Reader, a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.

Mercury Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are many examples available along with documentation.

Chrome extension: https://mercury.postlight.com/reader/
FreshRSS plugin: https://github.com/simon-wessel/freshrss-mercury-parser
Tiny Tiny RSS Plugin: https://github.com/HenryQW/mercury_fulltext

necrevistonnezr

Now called „Postlight Parser and under active development: https://github.com/postlight/parser

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Mercury Parser: extract content from URLs (e.g. for RSS aggregators)