Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. App Wishlist
  3. ArchiveBox -- Personal Internet Archive

ArchiveBox -- Personal Internet Archive

Scheduled Pinned Locked Moved Solved App Wishlist
29 Posts 12 Posters 5.6k Views 13 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • heliostaticH heliostatic

    https://archivebox.io
    "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

    Can import links from:

    • Pocket, Pinboard, Instapaper
    • RSS, XML, JSON, or plain text lists
    • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
      Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

    Can save these things for each site:

    • favicon.ico favicon of the site
    • example.com/page-name.html wget clone of the site, with .html appended if not present
    • output.pdf Printed PDF of site using headless chrome
    • screenshot.png 1440x900 screenshot of site using headless chrome
    • output.html DOM Dump of the HTML after rendering using headless chrome
    • archive.org.txt A link to the saved site on archive.org
    • warc/ for the html + gzipped warc file .gz
    • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
    • git/ clone of any repository for github, bitbucket, or gitlab links
    • index.html & index.json HTML and JSON index files containing metadata and details

    There's a Docker image, as well: https://github.com/pirate/ArchiveBox

    L Offline
    L Offline
    LoudLemur
    wrote on last edited by
    #17

    @heliostatic

    ArchiveBox was featured in a blog recently:

    https://ostechnix.com/self-host-internet-archive-with-archivebox/

    1 Reply Last reply
    1
    • RoundHouse1924R RoundHouse1924

      @loudlemur said in ArchiveBox -- Personal Internet Archive:

      The last ArchiveBox update was April 2021

      Precisely!
      There have been no releases for 10 months:-
      https://github.com/ArchiveBox/ArchiveBox/tags

      https://github.com/ArchiveBox/ArchiveBox/commits/dev
      shows commits just 7 days ago, but IMO not much use without releases.

      L Offline
      L Offline
      LoudLemur
      wrote on last edited by
      #18

      @RoundHouse1924 said in ArchiveBox -- Personal Internet Archive:

      @loudlemur said in ArchiveBox -- Personal Internet Archive:

      The last ArchiveBox update was April 2021

      Precisely!
      There have been no releases for 10 months:-
      https://github.com/ArchiveBox/ArchiveBox/tags

      https://github.com/ArchiveBox/ArchiveBox/commits/dev
      shows commits just 7 days ago, but IMO not much use without releases.

      There have been 5 releases since August:
      https://selfhosted.libhunt.com/bookmark-archiver-changelog

      1 Reply Last reply
      0
      • heliostaticH heliostatic

        https://archivebox.io
        "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

        Can import links from:

        • Pocket, Pinboard, Instapaper
        • RSS, XML, JSON, or plain text lists
        • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
          Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

        Can save these things for each site:

        • favicon.ico favicon of the site
        • example.com/page-name.html wget clone of the site, with .html appended if not present
        • output.pdf Printed PDF of site using headless chrome
        • screenshot.png 1440x900 screenshot of site using headless chrome
        • output.html DOM Dump of the HTML after rendering using headless chrome
        • archive.org.txt A link to the saved site on archive.org
        • warc/ for the html + gzipped warc file .gz
        • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
        • git/ clone of any repository for github, bitbucket, or gitlab links
        • index.html & index.json HTML and JSON index files containing metadata and details

        There's a Docker image, as well: https://github.com/pirate/ArchiveBox

        L Offline
        L Offline
        LoudLemur
        wrote on last edited by
        #19

        @heliostatic

        ArchiveBox is very popular. I hope Cloudron support it.

        brave_KkjJdd9iqg.jpg

        Also consider ArchivesSpace:
        https://forum.cloudron.io/topic/4121/archivesspace-archives-collection-management-system/1

        1 Reply Last reply
        0
        • L Offline
          L Offline
          LoudLemur
          wrote on last edited by
          #20

          I thought I would bump this worthy request again.

          ArchiveBox is all about self-hosting. Is it that there is some much better option or that people don't feel the need to self-host archives that this request has received so little love?

          1 Reply Last reply
          3
          • Sam_ukS Offline
            Sam_ukS Offline
            Sam_uk
            wrote on last edited by
            #21

            +1 I'd use this

            1 Reply Last reply
            1
            • ? Offline
              ? Offline
              A Former User
              wrote on last edited by
              #22

              I'm also looking for this app in Cloudron 🙂 , there is lot of content I'm interested to persist for the future

              1 Reply Last reply
              3
              • girishG Offline
                girishG Offline
                girish
                Staff
                wrote on last edited by
                #23

                Incidentally, this got packaged just last week. We just have to double check and publish it...

                L 1 Reply Last reply
                3
                • ? Offline
                  ? Offline
                  A Former User
                  wrote on last edited by
                  #24

                  @girish awesome ! 🙂
                  Thanks a lot for the news !

                  1 Reply Last reply
                  1
                  • girishG girish

                    Incidentally, this got packaged just last week. We just have to double check and publish it...

                    L Offline
                    L Offline
                    LoudLemur
                    wrote on last edited by
                    #25

                    @girish said in ArchiveBox -- Personal Internet Archive:

                    Incidentally, this got packaged just last week. We just have to double check and publish it...

                    Fantastic! That is a brilliant start to the New Year!

                    Thank you!

                    1 Reply Last reply
                    1
                    • L Offline
                      L Offline
                      LoudLemur
                      wrote on last edited by
                      #26

                      I am ardently expecting this...

                      1 Reply Last reply
                      1
                      • L Offline
                        L Offline
                        LoudLemur
                        wrote on last edited by
                        #27

                        One thing I hope is tested is how well ArchiveBox can store its archive on Block Storage options available on Cloudron. Archiving can take up a lot of space.

                        1 Reply Last reply
                        0
                        • girishG Offline
                          girishG Offline
                          girish
                          Staff
                          wrote on last edited by
                          #28

                          @LoudLemur https://git.cloudron.io/cloudron/archivebox-app/ is the repo and it supposedly already works. I haven't tested it out though (which is why it's not published yet).

                          1 Reply Last reply
                          1
                          • nebulonN Offline
                            nebulonN Offline
                            nebulon
                            Staff
                            wrote on last edited by
                            #29

                            The initial package version is now published as unstable. Will lock this topic and the new forum section is at https://forum.cloudron.io/category/182/archivebox

                            1 Reply Last reply
                            3
                            • nebulonN nebulon locked this topic on
                            • nebulonN nebulon marked this topic as a question on
                            • nebulonN nebulon has marked this topic as solved on
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • Bookmarks
                            • Search