-
We have Gitlab installed as a Cloudron app on our VPS server. Until recently, the our gitlab-runner.service was working fine. On Tuesday 3rd June we started getting unexpected failed jobs from the gitlab-runner. Upon further testing, it appears that the first 29 jobs pass as expected, and then the 30th job, and any subsequent jobs, fail until the service is stopped and restarted. The error seems to happen when the pipeline job is cloning a repository. This is an example of a log from one of the failed jobs:
Running with gitlab-runner 17.11.0 (v17.11.0)
on default_mydevice_somenumber A-number, system ID: id_number
Preparing the "shell" executor 00:00
Using Shell (bash) executor...
Preparing environment
Running on mydevice...
Getting source from Git repository
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /var/lib/private/gitlab-runner/builds/A-number/0/OrgName/mainRepo/.git/
Checking out 130d8ca7 as detached HEAD (ref is refs/merge-requests/294/head)...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:03
$ export repo2=$(mktemp -d)
$ GIT_TRACE=1 GIT_FLUSH=1 git -c core.progress=false clone https://gitlab-ci-token:$CI_JOB_TOKEN@my-self-hosted-git.com/OrgName/repo2 $repo2
09:27:40.005972 git.c:476 trace: built-in: git clone https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2 /tmp/tmp.NDTArMtQAp
Cloning into '/tmp/tmp.NDTArMtQAp'...
09:27:40.009784 run-command.c:667 trace: run_command: git remote-https origin https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2
09:27:40.009805 run-command.c:759 trace: start_command: /nix/store/805a5wv1cyah5awij184yfad1ksmbh9f-git-2.49.0/libexec/git-core/git remote-https origin https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2
09:27:40.011889 git.c:772 trace: exec: git-remote-https origin https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2
09:27:40.011944 run-command.c:667 trace: run_command: git-remote-https origin https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2
09:27:40.011965 run-command.c:759 trace: start_command: /nix/store/805a5wv1cyah5awij184yfad1ksmbh9f-git-2.49.0/libexec/git-core/git-remote-https origin https://gitlab-ci-token:[MASKED]@my-self-hosted-git.com/OrgName/repo2
warning: redirecting to https://my-self-hosted-git.com/OrgName/repo2.git/
09:27:40.411977 run-command.c:667 trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 1061622 on mydevice' --check-self-contained-and-connected
09:27:40.412021 run-command.c:759 trace: start_command: /nix/store/805a5wv1cyah5awij184yfad1ksmbh9f-git-2.49.0/libexec/git-core/git index-pack --stdin --fix-thin '--keep=fetch-pack 1061622 on mydevice' --check-self-contained-and-connected
09:27:40.415115 git.c:476 trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 1061622 on mydevice' --check-self-contained-and-connected
fatal: write error: No space left on device
fatal: fetch-pack: invalid index-pack output
Running after_script
Running after script...
$ rm -rf $repo2
Cleaning up project directory and file based variables
ERROR: Job failed: exit status 1The device on which the gitlab-runner is running has plenty of space, as does the Linode server. We tried increasing the RAM on the Linode server, but still found the 30th job onwards failed, after the first 29 passed. This behaviour suggests something is building up somewhere and then preventing further jobs from succeeding, but we are not sure what. We are looking for any help or pointers you might be able to give us to solve this issue.
I wonder if the failure may have been caused by an update to the gitlab app. Perhaps someone else may have reported similar problems.
Thanks in advance.
-
The logs mention that the system ran out of disk space:
... fatal: write error: No space left on device ....
Not sure how the server is setup which has the runner (should not be on Cloudron) but since the runner is failing, this is a bit out of scope for Cloudron itself.
-
-
-
Thank you, Nebulon, for your reply.
The runners are on a different machine.
I don't think that the machine with the runners is running out of space because it has a very large amount available on all partitions (the root has 300GB available). I think the error is originating from the cloudron machine (although I may be wrong).
In order to try to confirm the location of the error, I will setup another gitlab runner client on a different machine, and see if that has the same problem. Hopefully that will isolate the error either to the Cloudron server, or else the gitlab runner machines.