surfer put crashes with --delete option and Error: ENOENT: no such file or directory
-
Hey there! I ran into a weird issue earlier. There is this deploy pipeline which uses
+ surfer put --token surfertoken --delete --server example.com ./dist/* /
to put files on a surfer instance. Usually it does that:Using server https://example.com Removing /file1.js Removing /file2.js Uploading /200.html -> /200.html Uploading /index.html -> /index.html Done
But recently every other deployment "crashes" (it doesn't throw an error code, it just stops during the delete operation like this:
Using server https://example.com Removing /de/projects Removing /de/projects/index.html Removing /article
I've checked the surfer logs and could find this:
2023-03-20T05:42:23.000Z [0mDELETE /api/files/%2F123.js?access_token=xxx [32m200[0m 38.077 ms - 14[0m 2023-03-20T05:42:24.000Z [Error: ENOENT: no such file or directory, stat '/app/data/public/article'] { 2023-03-20T05:42:24.000Z errno: [33m-2[39m, 2023-03-20T05:42:24.000Z code: [32m'ENOENT'[39m, 2023-03-20T05:42:24.000Z syscall: [32m'stat'[39m, 2023-03-20T05:42:24.000Z path: [32m'/app/data/public/article'[39m 2023-03-20T05:42:24.000Z } 2023-03-20T05:42:24.000Z [Error: ENOENT: no such file or directory, stat '/app/data/public/current'] { 2023-03-20T05:42:24.000Z errno: [33m-2[39m, 2023-03-20T05:42:24.000Z code: [32m'ENOENT'[39m, 2023-03-20T05:42:24.000Z syscall: [32m'stat'[39m, 2023-03-20T05:42:24.000Z path: [32m'/app/data/public/current'[39m 2023-03-20T05:42:24.000Z } 2023-03-20T05:42:24.000Z HttpError: Unable to remove 2023-03-20T05:42:24.000Z at [90m/app/code/[39msrc/files.js:258:25 { 2023-03-20T05:42:24.000Z status: [33m500[39m, 2023-03-20T05:42:24.000Z internalError: [1mnull[22m, 2023-03-20T05:42:24.000Z details: [1mnull[22m 2023-03-20T05:42:24.000Z } 2023-03-20T05:42:24.000Z HttpError: Unable to remove 2023-03-20T05:42:24.000Z at [90m/app/code/[39msrc/files.js:258:25 { 2023-03-20T05:42:24.000Z status: [33m500[39m, 2023-03-20T05:42:24.000Z internalError: [1mnull[22m, 2023-03-20T05:42:24.000Z details: [1mnull[22m 2023-03-20T05:42:24.000Z }
Some more files get deleted, but the operation gets aborted way too early, which means the new files never get uploaded. This leaves the site incomplete and in a broken state, which obviously is not great. Any idea what's going on there? If the build process is triggered a second time and only few files need deleting, it mostly works. Also, is there a way to upload the new files before deleting the obsolete ones?
-
The way it works is, that surfer cli will first list remote and local file trees, then calculate the diff and then issue command to the remote server. The error here most likely indicates, that between listing file trees and issuing the commands, the remote filesystem has changed (the folders/files in question are already removed)
So is it possible that there are two processes running in parallel here somehow? Maybe the pipeline is run twice at the same time?
-
-
@nebulon ok I think the issue for this was limited system ressources and the server just couldn't cope with it. It's odd that Drone (using the package frrom @fbartels) didn't recognize the surfer error as a problem in the pipeline but thought it was successful.
Either way, this leaves still the problem with larger deployments, when the removing of the files takes 1-2 minutes, you have more or less a downtime in between deployments, because the new/changed files will only be uploaded afterwards.
Idea for a solution: Be able to change the path to
/public
in something like/release/<timestamp>/
via symlink, so one could deploy the files and when they're all up, change the symlink to the latest one and keep the last 3 deployments as backups. Is that something that could be implemented? Or how is everyone else solving that problem? -
Ok the problem is not solved after all, I thought it was, having a dedicated VPS for the runners, but it still fails every other time that it's being triggered.
The way I see it, there are a few possible paths to go down:
- add a flag to not throw an error/ignore when a file cannot be removed because it's no longer there so the rest of the deployment can continue
- transfer all files first, then compare and remove the files which are on target but not on source
- add a flag to delete * on target before uploading new files
Any other suggestion is welcome, but I'm running out of ideas tbh
-
@msbt so I could add the "contine on error" mode, however I do wonder why files are deleted on the remote then. This suggests something else is operating on the files there so I am not sure a contine on error does not incur other races here and result in unexpected status.
-
@msbt alright, then maybe you can give me access to your Cloudron hosting the second test surfer instance, to debug this?
If so, please send a mail to support@cloudron.io with your dashboard domain and remote SSH enabled. Also let me know how to exactly reproduce this.
-