ArchiveBox default installation exposes private data and uploads to archive.org without user consent
-
Bug Report: ArchiveBox on Cloudron
ArchiveBox default installation exposes private data and uploads to archive.org without user consent
Description:
In the standard installation of ArchiveBox on Cloudron, all content is publicly accessible by default, and archived content is automatically uploaded to archive.org. This behavior is unexpected and potentially harmful for users of a platform like Cloudron, which is often used for personal or sensitive data storage.
Steps to reproduce:
- Install ArchiveBox on Cloudron using the standard installation process.
- Add content to be archived.
- Observe that the content is publicly accessible and being uploaded to archive.org.
Expected behavior:
The default installation should prioritize user privacy and data protection. The following settings should be set as standard in ArchiveBox.conf:
[PRIVACY] SAVE_ARCHIVE_DOT_ORG = False PUBLIC_INDEX = False PUBLIC_SNAPSHOTS = False PUBLIC_ADD_VIEW = False
Actual behavior:
- All archived content is publicly accessible.
- Content is automatically uploaded to archive.org without user consent.
- Users must manually change privacy settings after installation.
Impact:
- Potential exposure of sensitive or private information.
- Unauthorized distribution of copyrighted or confidential material.
- Users may face difficulties in removing unintentionally uploaded content from archive.org.
Suggested fix:
Update the default installation configuration to include the privacy settings mentioned above. This will ensure that user data remains private by default, and no automatic uploads to archive.org occur without explicit user consent.
-
-
To clarify, no archived data is "uploaded" to Archive.org, only URLs are sent to them, and they only archive things that are publically accessible on the web (which they could arguably find through other means). If the URL requires cookies or a login of any kind, they do not archive it or store the URL.
Here are more in-depth explanations for the reasoning behind this default:
- https://news.ycombinator.com/item?id=26866689
- https://github.com/ArchiveBox/ArchiveBox#archiving-private-content
- https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview
In summary, ArchiveBox in its default mode should not be used on anything other than public URLs. We cannot make private mode the default (which would imply it's safe for novice users) because of this major security risk it incurs: https://github.com/ArchiveBox/ArchiveBox#security-risks-of-viewing-archived-js (as is explained in several locations in the docs).
If you are bold enough to attempt archiving private content, there is an detailed process involving reading about the security risks, setting up HTTPS ingress / separate domains content and the admin UI, changing UI permissions, setting up an admin user, changing several defaults, etc.
SAVE_ARCHIVE_DOT_ORG=False
is just a small piece of the threat model.I understand it's a controversial decision that alienates some users, but making it wide-open by default was an intentional choice so users are more immediately aware that it's designed for low-security public archival out-of-the-box, with involved configuration needed to change that.
If you insist on changing the default cloudron config to be closer to private mode, I ask that you at least force users to read the two docs links I shared above at some point in the setup process to understand that more hardening is needed for safe archival of private data.
-
I think the defaults should be kept as is myself