surfer put crashes with --delete option and Error: ENOENT: no such file or directory

msbt

Hey there! I ran into a weird issue earlier. There is this deploy pipeline which uses
+ surfer put --token surfertoken --delete --server example.com ./dist/* /
to put files on a surfer instance. Usually it does that:

Using server https://example.com
Removing /file1.js
Removing /file2.js
Uploading /200.html -> /200.html
Uploading /index.html -> /index.html
Done

But recently every other deployment "crashes" (it doesn't throw an error code, it just stops during the delete operation like this:

Using server https://example.com
Removing /de/projects
Removing /de/projects/index.html
Removing /article

I've checked the surfer logs and could find this:

2023-03-20T05:42:23.000Z [0mDELETE /api/files/%2F123.js?access_token=xxx [32m200[0m 38.077 ms - 14[0m
2023-03-20T05:42:24.000Z [Error: ENOENT: no such file or directory, stat '/app/data/public/article'] {
2023-03-20T05:42:24.000Z errno: [33m-2[39m,
2023-03-20T05:42:24.000Z code: [32m'ENOENT'[39m,
2023-03-20T05:42:24.000Z syscall: [32m'stat'[39m,
2023-03-20T05:42:24.000Z path: [32m'/app/data/public/article'[39m
2023-03-20T05:42:24.000Z }
2023-03-20T05:42:24.000Z [Error: ENOENT: no such file or directory, stat '/app/data/public/current'] {
2023-03-20T05:42:24.000Z errno: [33m-2[39m,
2023-03-20T05:42:24.000Z code: [32m'ENOENT'[39m,
2023-03-20T05:42:24.000Z syscall: [32m'stat'[39m,
2023-03-20T05:42:24.000Z path: [32m'/app/data/public/current'[39m
2023-03-20T05:42:24.000Z }
2023-03-20T05:42:24.000Z HttpError: Unable to remove
2023-03-20T05:42:24.000Z at [90m/app/code/[39msrc/files.js:258:25 {
2023-03-20T05:42:24.000Z status: [33m500[39m,
2023-03-20T05:42:24.000Z internalError: [1mnull[22m,
2023-03-20T05:42:24.000Z details: [1mnull[22m
2023-03-20T05:42:24.000Z }
2023-03-20T05:42:24.000Z HttpError: Unable to remove
2023-03-20T05:42:24.000Z at [90m/app/code/[39msrc/files.js:258:25 {
2023-03-20T05:42:24.000Z status: [33m500[39m,
2023-03-20T05:42:24.000Z internalError: [1mnull[22m,
2023-03-20T05:42:24.000Z details: [1mnull[22m
2023-03-20T05:42:24.000Z }

Some more files get deleted, but the operation gets aborted way too early, which means the new files never get uploaded. This leaves the site incomplete and in a broken state, which obviously is not great. Any idea what's going on there? If the build process is triggered a second time and only few files need deleting, it mostly works. Also, is there a way to upload the new files before deleting the obsolete ones?

nebulon

The way it works is, that surfer cli will first list remote and local file trees, then calculate the diff and then issue command to the remote server. The error here most likely indicates, that between listing file trees and issuing the commands, the remote filesystem has changed (the folders/files in question are already removed)

So is it possible that there are two processes running in parallel here somehow? Maybe the pipeline is run twice at the same time?

msbt

@nebulon thanks for the reply! I don't see a second process running in parallel, but I'll do a bit of investigating and get back to you if I find something. Is there a debug parameter I could add to see more from the output?

nebulon

@msbt there is no cli parameter currently to add more verbose output. You can try to run it with DEBUG=* though, which various node modules support and it might give some more hints.

msbt

@nebulon ok I think the issue for this was limited system ressources and the server just couldn't cope with it. It's odd that Drone (using the package frrom @fbartels) didn't recognize the surfer error as a problem in the pipeline but thought it was successful.

Either way, this leaves still the problem with larger deployments, when the removing of the files takes 1-2 minutes, you have more or less a downtime in between deployments, because the new/changed files will only be uploaded afterwards.

Idea for a solution: Be able to change the path to /public in something like /release/<timestamp>/ via symlink, so one could deploy the files and when they're all up, change the symlink to the latest one and keep the last 3 deployments as backups. Is that something that could be implemented? Or how is everyone else solving that problem?

girish

@msbt How many files are we talking about here? Can you cd /app/data/public and then find . -type f | wc -l ?

msbt

@girish in this case it's 1590 files

msbt

Ok the problem is not solved after all, I thought it was, having a dedicated VPS for the runners, but it still fails every other time that it's being triggered.

The way I see it, there are a few possible paths to go down:

add a flag to not throw an error/ignore when a file cannot be removed because it's no longer there so the rest of the deployment can continue
transfer all files first, then compare and remove the files which are on target but not on source
add a flag to delete * on target before uploading new files

Any other suggestion is welcome, but I'm running out of ideas tbh

nebulon

@msbt so I could add the "contine on error" mode, however I do wonder why files are deleted on the remote then. This suggests something else is operating on the files there so I am not sure a contine on error does not incur other races here and result in unexpected status.

msbt

@nebulon I don't know how to debug this, the pipeline is only run once and there doesn't seem to be a second instance deleting the files. You're welcome to hop on and check the logs

nebulon

@msbt can you reproduce this outside the ci pipeline, like from your laptop or so?

msbt

@nebulon yes, just tried without the pipeline and it also crashes almost every other time. Also tried with a different surfer instance, same issue. Running out of ideas

nebulon

@msbt alright, then maybe you can give me access to your Cloudron hosting the second test surfer instance, to debug this?

If so, please send a mail to support@cloudron.io with your dashboard domain and remote SSH enabled. Also let me know how to exactly reproduce this.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

surfer put crashes with --delete option and Error: ENOENT: no such file or directory