Backup task crashes when a ClickHouse app deletes a temp merge dir mid-snapshot
-
Backup task crashes when a ClickHouse app deletes a temp merge dir mid-snapshot
(
Cannot read properties of null (reading 'sort'))Summary
A full backup task aborts entirely when it reaches a ClickHouse-backed app (in my case Langfuse). The rsync syncer walks the app's data tree, but ClickHouse deletes a background-merge temp directory (
data/clickhouse/store/*/tmp_merge_*) between enumeration and read.readTreegetsnullfor that directory's contents and throws on.sort(), which propagates up and kills the whole task — not just the one app.Impact
This is the important part: one racy app aborts the entire backup run. In the run below the task died at app 31 of 72, so all 41 remaining apps were silently left on the previous night's snapshot. The failure is timing-dependent (it only triggers if ClickHouse happens to be merging during the snapshot window), so backups appear intermittently broken with no config change on the user's side.
Environment
Cloudron version 9.2 Server OS 9.2.0 Ubuntu version Ubuntu 24.04.2 LTS Linux 6.8.0-124-generic Backup format rsync, encrypted ( encrypted: true)Backup storage Hetzner Storage Box over SSH (port 23) Affected app Langfuse (uses ClickHouse) What happens
The same vanishing-temp-dir condition shows up twice in one snapshot. First it's caught harmlessly by the precondition
du, which just warns and continues:du: cannot access '/home/yellowtent/appsdata/5edfb175-…/data/clickhouse/store/f0a/f0a84db1-…/tmp_merge_202606_23202_23297_19': No such file or directoryThen it hits the rsync syncer, which does not tolerate it and crashes the task:
backupupload: upload completed. error: TypeError: Cannot read properties of null (reading 'sort') at readTree (file:///home/yellowtent/box/src/syncer.js:31:47) at traverse (file:///home/yellowtent/box/src/syncer.js:130:30) at traverse (file:///home/yellowtent/box/src/syncer.js:136:17) … (recursion) at Object.sync (file:///home/yellowtent/box/src/syncer.js:159:5) at sync (file:///home/yellowtent/box/src/backupformat/rsync.js:166:63) at Object.upload (file:///home/yellowtent/box/src/backupformat/rsync.js:336:18) at async Object.upload (file:///home/yellowtent/box/src/backuptask.js:101:37) tasks: setCompleted - 4798: {"result":null,"error":{"message":"Cannot read properties of null (reading 'sort')","reason":"External Error"},"percent":100}Root cause
readTree(box/src/syncer.js:31) reads a directory's entries and sorts them. When a subdirectory is removed between the initialfind/enumeration and the per-directory read, the read returnsnullandnull.sort()throws. The exception unwinds throughtraverse→sync→rsync.js→backuptask.js, so the whole task is marked failed rather than the single directory being skipped.This is expected, normal ClickHouse behaviour, not app misbehaviour: ClickHouse continuously creates and renames/deletes
tmp_merge_*(andtmp_insert_*,tmp_fetch_*) directories understore/<uuid>/as background merges complete. Any tool that enumerates the store and then reads it will occasionally find a directory gone. ClickHouse upstream has hit and patched the equivalent race in their own tooling, e.g. https://github.com/ClickHouse/ClickHouse/pull/44874 — so accommodating it on the reader side is the standard approach.Suggested fix
Make
readTreeresilient to a directory disappearing mid-traversal instead of propagatingnull:- Guard against a
null/undefinedentry list before.sort(). - On
ENOENT(or a null read) for a directory that vanished after enumeration, treat it as empty / skip it and continue the walk, rather than aborting.
This matches how
dualready behaves in the same task (warn and continue) and would make every ClickHouse-based package back up reliably.Why this matters beyond Langfuse
Any package built on ClickHouse is affected the same way — Plausible, PostHog, SigNoz, Langfuse, etc. As more analytics apps adopt ClickHouse, more users will see intermittent, hard-to-diagnose full-backup aborts where the visible symptom (a stale backup on an unrelated app) is far from the actual cause.
EDIT:
Workarounds in the meantime
- Exclude the ClickHouse app from automatic backups so it can't abort the whole run, and back it up separately.
- Note: stopping the app and then running
cloudron backup create --app <fqdn>does not work, because Cloudron does not back up a stopped app. A quiesced backup therefore has to be a filesystem-level snapshot taken outside the platform. Otherwise, because the failure is a timing collision, simply re-running the scheduled backup usually clears it.
- Guard against a
-
L LoudLemur referenced this topic
-
Following up on our above report, confirming this still reproduces on Cloudron 9.2.0, with the exact spot in the source in case it helps.
We hit it on a real app backup just now:
App backup error: Backup failed: Cannot read properties of null (reading 'sort')It reproduced on the second backup attempt under active ClickHouse merge load, so the race window isn't narrow.
9.2.0 does add a guard, but it sits after the
.sort(). Inbox/src/syncer.js,readTreeis:const names = safe.fs.readdirSync(dirPath).sort(); // line 31 if (!names) return []; // line 32@cloudron/safetydance'sreaddirSyncreturnsnullon a vanished directory (theENOENTwhen a ClickHousetmp_merge_*part is renamed/removed mid-snapshot). So line 31 evaluatesnull.sort()and throws before the line-32if (!names)guard can run — the guard is dead code for this crash, and the whole-server backup run still aborts.The fix is to null-check before
.sort():const names = safe.fs.readdirSync(dirPath); if (!names) return []; names.sort();(or
(safe.fs.readdirSync(dirPath) || []).sort()). The same file'sreadCache(around lines 22–23) already does it in this order — assign, null-check, then use — so this just makesreadTreeconsistent with the existing pattern in the file.Happy to test a patch against a live ClickHouse-bundling app under merge load.
-
@loudlemur thanks for the report. Fixed in https://git.cloudron.io/platform/box/-/commit/33f3ca39996ad5b3af2b5dd2320d1552dd8952c0
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login