To clarify, no archived data is "uploaded" to Archive.org, only URLs are sent to them, and they only archive things that are publically accessible on the web (which they could arguably find through other means). If the URL requires cookies or a login of any kind, they do not archive it or store the URL.
Here are more in-depth explanations for the reasoning behind this default:
- https://news.ycombinator.com/item?id=26866689
- https://github.com/ArchiveBox/ArchiveBox#archiving-private-content
- https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview
In summary, ArchiveBox in its default mode should not be used on anything other than public URLs. We cannot make private mode the default (which would imply it's safe for novice users) because of this major security risk it incurs: https://github.com/ArchiveBox/ArchiveBox#security-risks-of-viewing-archived-js (as is explained in several locations in the docs).
If you are bold enough to attempt archiving private content, there is an detailed process involving reading about the security risks, setting up HTTPS ingress / separate domains content and the admin UI, changing UI permissions, setting up an admin user, changing several defaults, etc. SAVE_ARCHIVE_DOT_ORG=False
is just a small piece of the threat model.
I understand it's a controversial decision that alienates some users, but making it wide-open by default was an intentional choice so users are more immediately aware that it's designed for low-security public archival out-of-the-box, with involved configuration needed to change that.
If you insist on changing the default cloudron config to be closer to private mode, I ask that you at least force users to read the two docs links I shared above at some point in the setup process to understand that more hardening is needed for safe archival of private data.