Backup failing with unspecified error
-
First backup after restoring Cloudron is failing with the following error:
box:backups nextcloud.domain Unable to backup { BoxError: Backuptask crashed at /home/yellowtent/box/src/backups.js:872:29 at f (/home/yellowtent/box/node_modules/once/once.js:25:25) at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:69:9) at ChildProcess.emit (events.js:198:13) at ChildProcess.EventEmitter.emit (domain.js:448:20) at Process.ChildProcess._handle.onexit (internal/child_process.js:248:12) name: 'BoxError', reason: 'Internal Error', details: {}, message: 'Backuptask crashed' } Jan 02 06:37:07 box:taskworker Task took 16626.103 seconds Jan 02 06:37:07 box:tasks setCompleted - 1205: {"result":null,"error":{"stack":"BoxError: Backuptask crashed\n at /home/yellowtent/box/src/backups.js:872:29\n at f (/home/yellowtent/box/node_modules/once/once.js:25:25)\n at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:69:9)\n at ChildProcess.emit (events.js:198:13)\n at ChildProcess.EventEmitter.emit (domain.js:448:20)\n at Process.ChildProcess._handle.onexit (internal/child_process.js:248:12)","name":"BoxError","reason":"Internal Error","details":{},"message":"Backuptask crashed"}} Jan 02 06:37:07 box:tasks 1205: {"percent":100,"result":null,"error":{"stack":"BoxError: Backuptask crashed\n at /home/yellowtent/box/src/backups.js:872:29\n at f (/home/yellowtent/box/node_modules/once/once.js:25:25)\n at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:69:9)\n at ChildProcess.emit (events.js:198:13)\n at ChildProcess.EventEmitter.emit (domain.js:448:20)\n at Process.ChildProcess._handle.onexit (internal/child_process.js:248:12)","name":"BoxError","reason":"Internal Error","details":{},"message":"Backuptask crashed"}}
I'm using rsync.
-
zjuhasz1replied to girish on Jan 4, 2021, 11:06 PM last edited by zjuhasz1 Jan 4, 2021, 11:08 PM
@girish I forgot one more line
Jan 04 06:16:13 box:shell backup-snapshot/app_dd36cce2-7674-4985-b671-a7dcd2d96a6c code: null, signal: SIGKILL
. The rest of the log is just the normal uploading messages. I'd post the whole thing but there are a lot of file names I'd rather not share. -
@zjuhasz1 said in Backup failing with unspecified error:
SIGKILL
Usually, this indicates that it got killed because of out of memory. Under backups -> configure, there is a memory limit slider. Can you give it lot more memory and try the backup again?
-
@girish It was already at 40GB but I increased it to 64 and reduced the concurrent uploads. It does seem to be crashing when uploading very large files. Does it need to store the entire file its uploading in memory or does it break large files into parts? I don't understand if the multi part upload size is used for rsync also or just tgz backups.
-
@zjuhasz1 and @girish I think this looks like the same with my issue!
https://forum.cloudron.io/topic/4129/update-to-4-1-0-1-23-0-failed
-
@zjuhasz1 yes
-
@zjuhasz1 I think we (@imc67 and I) found the reason. Can you confirm that the backups are failing when you take an app backup from the apps view (as opposed to the Backups view?)? If so, the fix is this:
- Open
/home/yellowtent/box/src/apptaskmanager.js
- Line 76 or so is like this:
tasks.startTask(taskId, { logFile, timeout: 20 * 60 * 60 * 1000 /* 20 hours */, nice: 15 }, function (error, result) {
Change it to (just adds memoryLimit):
tasks.startTask(taskId, { logFile, timeout: 20 * 60 * 60 * 1000 /* 20 hours */, nice: 15, memoryLimit: 4 * 1024 * 1024 * 1024 }, function (error, result) {
systemctl restart box
- Now take a backup.
- Open
-
@zjuhasz1 I suspect your issue is something else then. Can you contact us to on support@cloudron.io, I will need access to the server to debug what is happening. Thanks
-
@zjuhasz1 What kind of timeout are you hitting? Do you have any logs/error message?
There are two high level timeouts - one is the various things like upload/copy etc during the backup process. I can tell you how to tweak them, if this is where it's failing. The other timeout is that the entire backup can take a day at most. After a day, the backup is killed (because it's assumed we probably hit some bug).
-
don't forget the network timeouts with optimized TCP settings, which can disconnect too early.
5/15