celery beat keeps restarting
-
Hi,
this is something I just noticed in my logs, but could not yet find a permanent solutions for. The following log messages are repeating:
Nov 05 15:49:28 celery beat v4.4.7 (cliffs) is starting. Nov 05 15:49:29 2020-11-05 14:49:29,169 INFO exited: celery-beat (exit status 73; not expected) Nov 05 15:49:30 [pid: 230|app: 0|req: 2/2] 172.18.0.1 () {34 vars in 444 bytes} [Thu Nov 5 14:49:30 2020] GET /healthz/ => generated 2 bytes in 15 msecs (HTTP/1.1 200) 10 headers in 535 bytes (1 switches on core 0) Nov 05 15:49:30 2020-11-05 14:49:30,027 INFO spawned: 'celery-beat' with pid 281 Nov 05 15:49:30 172.18.0.1 - - [05/Nov/2020:14:49:30 +0000] "GET /healthz/ HTTP/1.1" 200 2 "-" "Mozilla (CloudronHealth)" Nov 05 15:49:31 2020-11-05 14:49:31,029 INFO success: celery-beat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Nov 05 15:49:33 ERROR: Pidfile (/run/celery/beat.pid) already exists. Nov 05 15:49:33 Seems we're already running? (pid: 83) Nov 05 15:49:33 celery beat v4.4.7 (cliffs) is starting. Nov 05 15:49:33 2020-11-05 14:49:33,970 INFO exited: celery-beat (exit status 73; not expected) Nov 05 15:49:34 2020-11-05 14:49:34,974 INFO spawned: 'celery-beat' with pid 303 Nov 05 15:49:35 2020-11-05 14:49:35,976 INFO success: celery-beat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Nov 05 15:49:38 ERROR: Pidfile (/run/celery/beat.pid) already exists. Nov 05 15:49:38 Seems we're already running? (pid: 83) Nov 05 15:49:38 celery beat v4.4.7 (cliffs) is starting. Nov 05 15:49:39 2020-11-05 14:49:39,217 INFO exited: celery-beat (exit status 73; not expected)
Solution for now was to stop celery beat with
supervisorctl stop celery-beat
, remove the pidfile and then start the service again withsupervisorctl start celery-beat
. The log was afterwards fine for a while, but in one out of two restarts of the app the log messages came back and I needed to manually interfere again.Likely related and something I am still looking into is the following message in the weblate performance report:
The Celery tasks queue is too long, either the worker is not running, or is too slow.
The only queue that has more than 0 entries is the memory queue. But I did not yet find a way to manually inspect it
-
@fbartels said in celery beat keeps restarting:
ERROR: Pidfile (/run/celery/beat.pid) already exists
I guess we have to remove this file on startup. We do this for, for example, for apache. I think supervisor also comes with a program called pidproxy which will forward signals. This way the file gets cleaned up properly when the app is restarted etc (cc @nebulon )
-
Oddly I am not able to reproduce this issue, even if I manually create the exact same pid file, while celery-beat is stopped. After that celery-beat would still start up normally. Not sure what I miss here and why it ignores and replaces the existing pid file.
-
Ah, /tmp is indeed mounted to the host and not a tmpfs.
I just wanted to check if there is an indicator or error message when celery-beat starts the first time. I downloaded the "full log" from the log view, but strangely these error messages do not appear at all in that file.