Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. App Wishlist
  3. ArchiveBox -- Personal Internet Archive

ArchiveBox -- Personal Internet Archive

Scheduled Pinned Locked Moved Solved App Wishlist
29 Posts 12 Posters 5.0k Views 13 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • heliostaticH heliostatic

      https://archivebox.io
      "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

      Can import links from:

      • Pocket, Pinboard, Instapaper
      • RSS, XML, JSON, or plain text lists
      • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
        Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

      Can save these things for each site:

      • favicon.ico favicon of the site
      • example.com/page-name.html wget clone of the site, with .html appended if not present
      • output.pdf Printed PDF of site using headless chrome
      • screenshot.png 1440x900 screenshot of site using headless chrome
      • output.html DOM Dump of the HTML after rendering using headless chrome
      • archive.org.txt A link to the saved site on archive.org
      • warc/ for the html + gzipped warc file .gz
      • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
      • git/ clone of any repository for github, bitbucket, or gitlab links
      • index.html & index.json HTML and JSON index files containing metadata and details

      There's a Docker image, as well: https://github.com/pirate/ArchiveBox

      L Offline
      L Offline
      LoudLemur
      wrote on last edited by
      #6

      @heliostatic

      In nearly 3 years here ArchiveBox has only managed to receive 5 upvotes. This is depressing. There is a big need for self-hosted archiving and if this application had been supported 3 years ago, it would have been ready for all those people who had their websites shuttered.

      The last ArchiveBox update was April 2021. There is a Docker image. The code has received high quality ratings. Lets sort this!

      timconsidineT RoundHouse1924R 3 Replies Last reply
      0
      • L LoudLemur

        @heliostatic

        In nearly 3 years here ArchiveBox has only managed to receive 5 upvotes. This is depressing. There is a big need for self-hosted archiving and if this application had been supported 3 years ago, it would have been ready for all those people who had their websites shuttered.

        The last ArchiveBox update was April 2021. There is a Docker image. The code has received high quality ratings. Lets sort this!

        timconsidineT Offline
        timconsidineT Offline
        timconsidine
        App Dev
        wrote on last edited by
        #7

        @loudlemur said in ArchiveBox -- Personal Internet Archive:

        There is a big need for self-hosted archiving

        That doesn't seem to reconcile with only 5 votes.

        L 1 Reply Last reply
        1
        • timconsidineT timconsidine

          @loudlemur said in ArchiveBox -- Personal Internet Archive:

          There is a big need for self-hosted archiving

          That doesn't seem to reconcile with only 5 votes.

          L Offline
          L Offline
          LoudLemur
          wrote on last edited by
          #8

          @timconsidine said in ArchiveBox -- Personal Internet Archive:

          @loudlemur said in ArchiveBox -- Personal Internet Archive:

          There is a big need for self-hosted archiving

          That doesn't seem to reconcile with only 5 votes.

          Hah! Yes, there seems to be something wrong with reality here! haa!

          Cloudron is brilliant for the updates, which just keep on coming and keep on working. What is the balance of time given to updating existing applications and introducing new ones?

          Something like this was suggested recently, but couldn't we try crowd funding somebody with packaging skills to rattle off a few more applications and help clear the backlog?

          What could we hope for with a dedicated packager? 5 new packages a week? 10?

          timconsidineT 1 Reply Last reply
          0
          • L LoudLemur

            @heliostatic

            In nearly 3 years here ArchiveBox has only managed to receive 5 upvotes. This is depressing. There is a big need for self-hosted archiving and if this application had been supported 3 years ago, it would have been ready for all those people who had their websites shuttered.

            The last ArchiveBox update was April 2021. There is a Docker image. The code has received high quality ratings. Lets sort this!

            RoundHouse1924R Offline
            RoundHouse1924R Offline
            RoundHouse1924
            wrote on last edited by
            #9

            @loudlemur said in ArchiveBox -- Personal Internet Archive:

            The last ArchiveBox update was April 2021

            Precisely!
            There have been no releases for 10 months:-
            https://github.com/ArchiveBox/ArchiveBox/tags

            https://github.com/ArchiveBox/ArchiveBox/commits/dev
            shows commits just 7 days ago, but IMO not much use without releases.

            L 1 Reply Last reply
            0
            • L LoudLemur

              @heliostatic

              In nearly 3 years here ArchiveBox has only managed to receive 5 upvotes. This is depressing. There is a big need for self-hosted archiving and if this application had been supported 3 years ago, it would have been ready for all those people who had their websites shuttered.

              The last ArchiveBox update was April 2021. There is a Docker image. The code has received high quality ratings. Lets sort this!

              timconsidineT Offline
              timconsidineT Offline
              timconsidine
              App Dev
              wrote on last edited by timconsidine
              #10

              @loudlemur Just to be clear, I'm also very keen to see new apps on Cloudron.

              I just understand that I need to wait, and for those that I feel cannot wait for, I take an alternative route :

              • get another VPS and install CapRover. ArchiveBox is on CapRover if you have pressing need, as well some others that are on Cloudron wishlist. However in my experience as many as 40-50% of apps available on CapRover don't install correctly. More people packaging but quality and reliability much lower. Which is why it can be be worth waiting for a Cloudron app.
              • get another VPS and install docker / docker-compose, then no need of packaging, just installing apps which have already been dockerized
              • work through the tutorials about packaging, it's not easy but equally it's not beond reach. Lots of gotchas on the way, patient experimentation required, and personally I'm still learning, so it's for sure not a rapid solution to getting something packaged.

              All I'm saying is that's about priorities. If it's urgent enough and high enough priority, then that might suggest it's worth biting the bullet for an extra server for own docker installs or own CapRover or both.

              I have :

              • 1 x Cloudron
              • 1 x Caprover
              • 1 x "pure" Docker apps
              • 1 x Kasm
              • 1 x openEdX / Tutor.
                I accept I'm unlikely to get all of what I need/want on Cloudron in time for when I need/want it.
              1 Reply Last reply
              4
              • L LoudLemur

                @timconsidine said in ArchiveBox -- Personal Internet Archive:

                @loudlemur said in ArchiveBox -- Personal Internet Archive:

                There is a big need for self-hosted archiving

                That doesn't seem to reconcile with only 5 votes.

                Hah! Yes, there seems to be something wrong with reality here! haa!

                Cloudron is brilliant for the updates, which just keep on coming and keep on working. What is the balance of time given to updating existing applications and introducing new ones?

                Something like this was suggested recently, but couldn't we try crowd funding somebody with packaging skills to rattle off a few more applications and help clear the backlog?

                What could we hope for with a dedicated packager? 5 new packages a week? 10?

                timconsidineT Offline
                timconsidineT Offline
                timconsidine
                App Dev
                wrote on last edited by
                #11

                @loudlemur said in ArchiveBox -- Personal Internet Archive:

                What could we hope for with a dedicated packager? 5 new packages a week? 10?

                I would say one or 2.
                There's a lot of work in packaging & in testing.

                L 1 Reply Last reply
                1
                • timconsidineT timconsidine

                  @loudlemur said in ArchiveBox -- Personal Internet Archive:

                  What could we hope for with a dedicated packager? 5 new packages a week? 10?

                  I would say one or 2.
                  There's a lot of work in packaging & in testing.

                  L Offline
                  L Offline
                  LoudLemur
                  wrote on last edited by
                  #12

                  @timconsidine Thanks for those ideas.

                  It really is nice to know that if it is on Cloudron, it will work and be supported.

                  timconsidineT 1 Reply Last reply
                  2
                  • L LoudLemur

                    @timconsidine Thanks for those ideas.

                    It really is nice to know that if it is on Cloudron, it will work and be supported.

                    timconsidineT Offline
                    timconsidineT Offline
                    timconsidine
                    App Dev
                    wrote on last edited by
                    #13

                    @loudlemur said in ArchiveBox -- Personal Internet Archive:

                    t really is nice to know that if it is on Cloudron, it will work and be supported.

                    Yes. And you mention support. Support for Caprover apps is close to non-existent. No, actually, sorry, that's harsh. There is support, but not a patch on what is available here.

                    May also be worth exploring Yunohost and your own Heroku server (e.g. Dokku). They don't work for me, but they may for you or others.

                    1 Reply Last reply
                    0
                    • humptydumptyH Offline
                      humptydumptyH Offline
                      humptydumpty
                      wrote on last edited by
                      #14

                      I use a Firefox extension called SingleFile that saves webpages as HTML. Works great.

                      1 Reply Last reply
                      4
                      • turianT Offline
                        turianT Offline
                        turian
                        wrote on last edited by
                        #15

                        I also love this software, and have been pushing PRs to it.

                        1 Reply Last reply
                        2
                        • heliostaticH heliostatic

                          https://archivebox.io
                          "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

                          Can import links from:

                          • Pocket, Pinboard, Instapaper
                          • RSS, XML, JSON, or plain text lists
                          • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
                            Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

                          Can save these things for each site:

                          • favicon.ico favicon of the site
                          • example.com/page-name.html wget clone of the site, with .html appended if not present
                          • output.pdf Printed PDF of site using headless chrome
                          • screenshot.png 1440x900 screenshot of site using headless chrome
                          • output.html DOM Dump of the HTML after rendering using headless chrome
                          • archive.org.txt A link to the saved site on archive.org
                          • warc/ for the html + gzipped warc file .gz
                          • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
                          • git/ clone of any repository for github, bitbucket, or gitlab links
                          • index.html & index.json HTML and JSON index files containing metadata and details

                          There's a Docker image, as well: https://github.com/pirate/ArchiveBox

                          L Offline
                          L Offline
                          LoudLemur
                          wrote on last edited by
                          #16

                          @heliostatic What is the progress on supporting ArchiveBox on Cloudron?

                          1 Reply Last reply
                          0
                          • heliostaticH heliostatic

                            https://archivebox.io
                            "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

                            Can import links from:

                            • Pocket, Pinboard, Instapaper
                            • RSS, XML, JSON, or plain text lists
                            • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
                              Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

                            Can save these things for each site:

                            • favicon.ico favicon of the site
                            • example.com/page-name.html wget clone of the site, with .html appended if not present
                            • output.pdf Printed PDF of site using headless chrome
                            • screenshot.png 1440x900 screenshot of site using headless chrome
                            • output.html DOM Dump of the HTML after rendering using headless chrome
                            • archive.org.txt A link to the saved site on archive.org
                            • warc/ for the html + gzipped warc file .gz
                            • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
                            • git/ clone of any repository for github, bitbucket, or gitlab links
                            • index.html & index.json HTML and JSON index files containing metadata and details

                            There's a Docker image, as well: https://github.com/pirate/ArchiveBox

                            L Offline
                            L Offline
                            LoudLemur
                            wrote on last edited by
                            #17

                            @heliostatic

                            ArchiveBox was featured in a blog recently:

                            https://ostechnix.com/self-host-internet-archive-with-archivebox/

                            1 Reply Last reply
                            1
                            • RoundHouse1924R RoundHouse1924

                              @loudlemur said in ArchiveBox -- Personal Internet Archive:

                              The last ArchiveBox update was April 2021

                              Precisely!
                              There have been no releases for 10 months:-
                              https://github.com/ArchiveBox/ArchiveBox/tags

                              https://github.com/ArchiveBox/ArchiveBox/commits/dev
                              shows commits just 7 days ago, but IMO not much use without releases.

                              L Offline
                              L Offline
                              LoudLemur
                              wrote on last edited by
                              #18

                              @RoundHouse1924 said in ArchiveBox -- Personal Internet Archive:

                              @loudlemur said in ArchiveBox -- Personal Internet Archive:

                              The last ArchiveBox update was April 2021

                              Precisely!
                              There have been no releases for 10 months:-
                              https://github.com/ArchiveBox/ArchiveBox/tags

                              https://github.com/ArchiveBox/ArchiveBox/commits/dev
                              shows commits just 7 days ago, but IMO not much use without releases.

                              There have been 5 releases since August:
                              https://selfhosted.libhunt.com/bookmark-archiver-changelog

                              1 Reply Last reply
                              0
                              • heliostaticH heliostatic

                                https://archivebox.io
                                "ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

                                Can import links from:

                                • Pocket, Pinboard, Instapaper
                                • RSS, XML, JSON, or plain text lists
                                • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera, and more)
                                  Shaarli, Delicious, Reddit Saved Posts, Wallabag, Unmark.it, and any other text with links in it!

                                Can save these things for each site:

                                • favicon.ico favicon of the site
                                • example.com/page-name.html wget clone of the site, with .html appended if not present
                                • output.pdf Printed PDF of site using headless chrome
                                • screenshot.png 1440x900 screenshot of site using headless chrome
                                • output.html DOM Dump of the HTML after rendering using headless chrome
                                • archive.org.txt A link to the saved site on archive.org
                                • warc/ for the html + gzipped warc file .gz
                                • media/ any mp4, mp3, subtitles, and metadata found using youtube-dl
                                • git/ clone of any repository for github, bitbucket, or gitlab links
                                • index.html & index.json HTML and JSON index files containing metadata and details

                                There's a Docker image, as well: https://github.com/pirate/ArchiveBox

                                L Offline
                                L Offline
                                LoudLemur
                                wrote on last edited by
                                #19

                                @heliostatic

                                ArchiveBox is very popular. I hope Cloudron support it.

                                brave_KkjJdd9iqg.jpg

                                Also consider ArchivesSpace:
                                https://forum.cloudron.io/topic/4121/archivesspace-archives-collection-management-system/1

                                1 Reply Last reply
                                0
                                • L Offline
                                  L Offline
                                  LoudLemur
                                  wrote on last edited by
                                  #20

                                  I thought I would bump this worthy request again.

                                  ArchiveBox is all about self-hosting. Is it that there is some much better option or that people don't feel the need to self-host archives that this request has received so little love?

                                  1 Reply Last reply
                                  3
                                  • Sam_ukS Offline
                                    Sam_ukS Offline
                                    Sam_uk
                                    wrote on last edited by
                                    #21

                                    +1 I'd use this

                                    1 Reply Last reply
                                    1
                                    • ? Offline
                                      ? Offline
                                      A Former User
                                      wrote on last edited by
                                      #22

                                      I'm also looking for this app in Cloudron 🙂 , there is lot of content I'm interested to persist for the future

                                      1 Reply Last reply
                                      3
                                      • girishG Offline
                                        girishG Offline
                                        girish
                                        Staff
                                        wrote on last edited by
                                        #23

                                        Incidentally, this got packaged just last week. We just have to double check and publish it...

                                        L 1 Reply Last reply
                                        3
                                        • ? Offline
                                          ? Offline
                                          A Former User
                                          wrote on last edited by
                                          #24

                                          @girish awesome ! 🙂
                                          Thanks a lot for the news !

                                          1 Reply Last reply
                                          1
                                          • girishG girish

                                            Incidentally, this got packaged just last week. We just have to double check and publish it...

                                            L Offline
                                            L Offline
                                            LoudLemur
                                            wrote on last edited by
                                            #25

                                            @girish said in ArchiveBox -- Personal Internet Archive:

                                            Incidentally, this got packaged just last week. We just have to double check and publish it...

                                            Fantastic! That is a brilliant start to the New Year!

                                            Thank you!

                                            1 Reply Last reply
                                            1
                                            Reply
                                            • Reply as topic
                                            Log in to reply
                                            • Oldest to Newest
                                            • Newest to Oldest
                                            • Most Votes


                                              • Login

                                              • Don't have an account? Register

                                              • Login or register to search.
                                              • First post
                                                Last post
                                              0
                                              • Categories
                                              • Recent
                                              • Tags
                                              • Popular
                                              • Bookmarks
                                              • Search