ArchiveBox -- Personal Internet Archive
-
https://archivebox.io
"ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."Can import links from:
- Pocket, Pinboard, Instapaper
- RSS, XML, JSON, or plain text lists
- Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!
Can save these things for each site:
- favicon.ico favicon of the site
- example.com/page-name.html wget clone of the site, with .html appended if not present
- output.pdf Printed PDF of site using headless chrome
- screenshot.png 1440x900 screenshot of site using headless chrome
- output.html DOM Dump of the HTML after rendering using headless chrome
- archive.org.txt A link to the saved site on archive.org
- warc/ for the html + gzipped warc file .gz
- media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
- git/ clone of any repository for github, bitbucket, or gitlab links
- index.html & index.json HTML and JSON index files containing metadata and details
There's a Docker image, as well: https://github.com/pirate/ArchiveBox
@heliostatic What is the progress on supporting ArchiveBox on Cloudron?
-
https://archivebox.io
"ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."Can import links from:
- Pocket, Pinboard, Instapaper
- RSS, XML, JSON, or plain text lists
- Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!
Can save these things for each site:
- favicon.ico favicon of the site
- example.com/page-name.html wget clone of the site, with .html appended if not present
- output.pdf Printed PDF of site using headless chrome
- screenshot.png 1440x900 screenshot of site using headless chrome
- output.html DOM Dump of the HTML after rendering using headless chrome
- archive.org.txt A link to the saved site on archive.org
- warc/ for the html + gzipped warc file .gz
- media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
- git/ clone of any repository for github, bitbucket, or gitlab links
- index.html & index.json HTML and JSON index files containing metadata and details
There's a Docker image, as well: https://github.com/pirate/ArchiveBox
ArchiveBox was featured in a blog recently:
https://ostechnix.com/self-host-internet-archive-with-archivebox/
-
@loudlemur said in ArchiveBox -- Personal Internet Archive:
The last ArchiveBox update was April 2021
Precisely!
There have been no releases for 10 months:-
https://github.com/ArchiveBox/ArchiveBox/tagshttps://github.com/ArchiveBox/ArchiveBox/commits/dev
shows commits just 7 days ago, but IMO not much use without releases.@RoundHouse1924 said in ArchiveBox -- Personal Internet Archive:
@loudlemur said in ArchiveBox -- Personal Internet Archive:
The last ArchiveBox update was April 2021
Precisely!
There have been no releases for 10 months:-
https://github.com/ArchiveBox/ArchiveBox/tagshttps://github.com/ArchiveBox/ArchiveBox/commits/dev
shows commits just 7 days ago, but IMO not much use without releases.There have been 5 releases since August:
https://selfhosted.libhunt.com/bookmark-archiver-changelog -
https://archivebox.io
"ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."Can import links from:
- Pocket, Pinboard, Instapaper
- RSS, XML, JSON, or plain text lists
- Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!
Can save these things for each site:
- favicon.ico favicon of the site
- example.com/page-name.html wget clone of the site, with .html appended if not present
- output.pdf Printed PDF of site using headless chrome
- screenshot.png 1440x900 screenshot of site using headless chrome
- output.html DOM Dump of the HTML after rendering using headless chrome
- archive.org.txt A link to the saved site on archive.org
- warc/ for the html + gzipped warc file .gz
- media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
- git/ clone of any repository for github, bitbucket, or gitlab links
- index.html & index.json HTML and JSON index files containing metadata and details
There's a Docker image, as well: https://github.com/pirate/ArchiveBox
ArchiveBox is very popular. I hope Cloudron support it.
Also consider ArchivesSpace:
https://forum.cloudron.io/topic/4121/archivesspace-archives-collection-management-system/1 -
I'm also looking for this app in Cloudron
, there is lot of content I'm interested to persist for the future
-
@girish awesome !
Thanks a lot for the news ! -
Incidentally, this got packaged just last week. We just have to double check and publish it...
-
@LoudLemur https://git.cloudron.io/cloudron/archivebox-app/ is the repo and it supposedly already works. I haven't tested it out though (which is why it's not published yet).
-
N nebulon locked this topic on
-
N nebulon marked this topic as a question on
-
N nebulon has marked this topic as solved on