Issue with garbage data on Scaleway
-
In the event with a problem with a backup, does Cloudron clean up multipart data?
https://www.scaleway.com/en/docs/s3-multipart-upload/#-Aborting-a-Multipart-Upload
I just had an issue with the accumulation of all this, causing billing issues and quota issues, which broke their system too
-
@robi the cleanup is done automatically by the aws sdk module we use - https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html ("Handling Multipart Cleanup"). If that's not working with scaleway, might be a good idea to report this issue to them.
-
@girish when backups fail and retry, it seems to cause this as evidenced by multiple "directories" in the object store of the SAME date and different session strings.
So I wouldn't know if this is an SDK problem or an implementation of the SDK.
I am attempting to clean up 10s of GB of these extra directories, but their system is not so great at deletions (plagued with timeouts and their poor browser based client that sends chatty messages back and forth for every object).
-
@girish no, they're pretty useless. Their web UI S3 console is such crap it can't handle the chatty API requests and keeps timing out. Also I may be wrong that multiple directories are because of failures and restarts. It just looks like multiple changed apps per day get a new dir.
So I am attempting other workarounds. Like creating a new bucket and just nuking the old one.
rsync isn't great for object store backups as it makes a ton of small files.
tgz isn't great as it's a lot of repeated information.We need something hybrid that is the best of both.
Something like backing up to a local Minio much more quickly then doing an object to object store transfer offsite, which is much more efficient. This may also offer an opportunity to dedupe and further optimize.