https://archivebox.io
"ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."
Can import links from:
- Pocket, Pinboard, Instapaper
- RSS, XML, JSON, or plain text lists
- Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!
Can save these things for each site:
- favicon.ico favicon of the site
- example.com/page-name.html wget clone of the site, with .html appended if not present
- output.pdf Printed PDF of site using headless chrome
- screenshot.png 1440x900 screenshot of site using headless chrome
- output.html DOM Dump of the HTML after rendering using headless chrome
- archive.org.txt A link to the saved site on archive.org
- warc/ for the html + gzipped warc file .gz
- media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
- git/ clone of any repository for github, bitbucket, or gitlab links
- index.html & index.json HTML and JSON index files containing metadata and details
There's a Docker image, as well: https://github.com/pirate/ArchiveBox