App logs filling disk

sparkwise

Last Friday, we found that our Cloudron apps stopped working due to one app (Etherpad-Lite) filling 100% of /dev/root with >100GB of new app logs. We deleted the logs, rebooted, and chalked it up as a one-time issue. Earlier today, this same thing happened a second time: Disk filled with another 100GB of extra platformdata. Running cloudron-support --troubleshoot and rebooting Cloudron resolved the issue, but without a clear understanding of why this happened, I'm concerned that this could happen again.

This leaves me with a few follow-up questions:

After having already deleted the excess log files, is there anything we can look at to track down the root cause, or is the "evidence" now all gone?
Do we need to be actively managing log rotation for apps? Should I be setting up app-specific logrotate config files? Recommendations on how to do this?
Any recommendations on how to monitor or configure alerts on file size or disk usage on Cloudron? (Server is running on AWS EC2, so perhaps I just do this with AWS Cloudwatch tools.)
I'm seeing a number of recent disk-full threads on Cloudron and they didn't seem to be localized on Etherpad-Lite, so wondering if there are any platform issues that might be related.

Some specifics on our situation:

saw that our externally-hosted uptime server reported that multiple Cloudron-hosted apps were unresponsive.
visited Cloudron dashboard and found /dev/root at 100% capacity, with "platformdata" filling up disk usage chart. (160GB disk)
connected via SSH (AWS EC2 "Session Manager" option failed, presumably due to filled disk)
found that Etherpad-Lite app had created >100GB of log files in the past day
tracked down the troubleshooting instructions (idea: link to Troubleshooting page from the "/dev/root at 100% capacity" error dialog)
ran sudo cloudron-support --troubleshoot, still saw DNS failures, but after rebooting everything started normal again.

nebulon

Sounds like etherpad has an issue there then if it creates that many logs. I just checked and it seems to dump logs at DEBUG level. Will check if we can reduce this in the package.

nebulon

We've published a new app version with a reduced loglevel to INFO now.

sparkwise

Thank you!

sparkwise

Looks like logrotate was configured to time out after 90 seconds, and so the large Etherpad log files failed to rotate, and disk filled. Looks like it's time to extend the timeouts and generally review the logrotation config.

canadaduane

Oh wow, what an interesting (and slightly frightening) failure mode! "Your log files are so big that we've given up on managing them."

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

App logs filling disk