out of space error leading to missing certs

roofboard

I ran out of space last night when a backup was running, I tried removing a few unused or underused utilities to free up space... when I restarted it would not load after running journalctl -u nginx -fa I got the error listed below.

Any Ideas?

Jun 03 17:26:36 my.draglabs.com nginx[125911]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 17:26:36 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 17:26:36 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 17:26:36 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.

From what I can tell nginx cannot find my cert files. When I go to that folder and try to cat my certs it comes up blank...

I think I found the nginx.conf file in /home/yellowtent/platformdata/nginx
it loos pretty normal....

user www-data;

# detect based on available CPU cores
worker_processes  auto;

# this is 4096 by default. See /proc/<PID>/limits and /etc/security/limits.conf
# usually twice the worker_connections (one for uptsream, one for downstream)
# see also LimitNOFILE=16384 in systemd drop-in
worker_rlimit_nofile 8192;

pid /run/nginx.pid;

events {
    # a single worker has these many simultaneous connections max
    worker_connections  4096;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    # required for long host names
    server_names_hash_bucket_size 128;

    access_log /var/log/nginx/access.log combined;

    sendfile        on;

    # timeout for client to finish sending headers
    client_header_timeout 30s;

    # timeout for reading client request body (successive read timeout and not whole body!)
    client_body_timeout 60s;

    # keep-alive connections timeout in 65s. this is because many browsers timeout in 60 seconds
    keepalive_timeout  65s;

    # zones for rate limiting
    limit_req_zone $binary_remote_addr zone=admin_login:10m rate=10r/s; # 10 request a second

    include applications/*.conf;
}

roofboard

@roofboard
Correction - after using SCP to download my ngix folder I can see that their are certs for all domains, and keys for all domains, the only thing which appears to be missing is the key for _.draglabs.com is it possible to regenerate? Or can I find it in the backups?

roofboard

@roofboard
Still unresolved after renaming the nginx.conf to old.nginx the error changed but still not working. - I am now restoring the old nginx.conf file.

-- The job identifier is 2014.
Jun 03 19:46:55 my.draglabs.com nginx[14813]: nginx: [emerg] open() "/etc/nginx/nginx.conf" failed (2: No such file or directory)
Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- An ExecStart= process belonging to unit nginx.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 1.
Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit nginx.service has entered the 'failed' state with result 'exit-code'.
Jun 03 19:46:55 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
-- Subject: A start job for unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit nginx.service has finished with a failure.
-- 
-- The job identifier is 2014 and the job result is failed.
Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Automatic restarting of the unit nginx.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Jun 03 19:46:55 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
-- Subject: A stop job for unit nginx.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit nginx.service has finished.
-- 
-- The job identifier is 2084 and the job result is done.
Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit nginx.service has entered the 'failed' state with result 'exit-code'.
Jun 03 19:46:55 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
-- Subject: A start job for unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit nginx.service has finished with a failure.
-- 
-- The job identifier is 2084 and the job result is failed.
Jun 03 19:46:56 my.draglabs.com systemd[11909]: var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit UNIT has successfully entered the 'dead' state.
Jun 03 19:46:56 my.draglabs.com systemd[1]: var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount has successfully entered the 'dead' state.
Jun 03 19:47:04 my.draglabs.com kernel: Packet dropped: IN=ens3 OUT= MAC=00:00:50:d1:ef:23:fe:00:50:d1:ef:23:08:00 SRC=72.167.32.184 DST=80.209.239.35 LEN=40 TOS=0x00 PREC=0x00 TTL=238 ID=57427 PROTO=TCP SPT=56603 DPT=3389 WINDOW=1024 RES=0x00 SYN URGP=0 
Jun 03 19:47:04 my.draglabs.com systemd[1]: systemd-timedated.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--

jdaviescoates

Have you checked you've got a green light for everything under services? When you run of space unbound often needs to be restarted.

roofboard

@jdaviescoates
Looks like unbound was not running, now it is but the ngnix.conf is still not repopulating, and ngnix will not start.

root@my:/home/yellowtent/platformdata/nginx/old# systemctl status unbound
● unbound.service - Unbound DNS Resolver
     Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2022-06-03 20:26:54 UTC; 6min ago
   Main PID: 1028 (code=exited, status=1/FAILURE)

Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Scheduled restart job, restart counter is at 5.
Jun 03 20:26:54 my.draglabs.com systemd[1]: Stopped Unbound DNS Resolver.
Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Start request repeated too quickly.
Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Failed with result 'exit-code'.
Jun 03 20:26:54 my.draglabs.com systemd[1]: Failed to start Unbound DNS Resolver.
root@my:/home/yellowtent/platformdata/nginx/old# unbound-anchor -a /var/lib/unbound/root.key
root@my:/home/yellowtent/platformdata/nginx/old# systemctl restart unbound
root@my:/home/yellowtent/platformdata/nginx/old# systemctl status unbound
● unbound.service - Unbound DNS Resolver
     Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-06-03 20:34:49 UTC; 14s ago
   Main PID: 14789 (unbound)
      Tasks: 1 (limit: 19105)
     Memory: 6.0M
     CGroup: /system.slice/unbound.service
             └─14789 /usr/sbin/unbound -d

Jun 03 20:34:49 my.draglabs.com systemd[1]: Starting Unbound DNS Resolver...
Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 0: subnet
Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 1: validator
Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 2: iterator
Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] info: start of service (unbound 1.9.4).
Jun 03 20:34:49 my.draglabs.com systemd[1]: Started Unbound DNS Resolver.
root@my:/home/yellowtent/platformdata/nginx/old# systemctl status nginx
● nginx.service - nginx - high performance web server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/nginx.service.d
             └─cloudron.conf
     Active: failed (Result: exit-code) since Fri 2022-06-03 20:33:30 UTC; 1min 56s ago
       Docs: http://nginx.org/en/docs/
    Process: 14491 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=1/FAILURE)

Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jun 03 20:33:30 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 20:33:30 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
root@my:/home/yellowtent/platformdata/nginx/old# systemctl restart nginx
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.
root@my:/home/yellowtent/platformdata/nginx/old# cd ..
root@my:/home/yellowtent/platformdata/nginx# ls
applications  cert  mime.types  old
root@my:/home/yellowtent/platformdata/nginx# re

girish

@roofboard So, you just have to delete the app config files in /etc/nginx/applications and then run systemctl restart nginx and systemctl restart box.

When you restart box, it will re-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app.

/etc/nginx/nginx.conf should be:

user www-data;

# detect based on available CPU cores
worker_processes  auto;

# this is 4096 by default. See /proc/<PID>/limits and /etc/security/limits.conf
# usually twice the worker_connections (one for uptsream, one for downstream)
# see also LimitNOFILE=16384 in systemd drop-in
worker_rlimit_nofile 8192;

pid /run/nginx.pid;

events {
    # a single worker has these many simultaneous connections max
    worker_connections  4096;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    # required for long host names
    server_names_hash_bucket_size 128;

    access_log /var/log/nginx/access.log combined;

    sendfile        on;

    # timeout for client to finish sending headers
    client_header_timeout 30s;

    # timeout for reading client request body (successive read timeout and not whole body!)
    client_body_timeout 60s;

    # keep-alive connections timeout in 65s. this is because many browsers timeout in 60 seconds
    keepalive_timeout  65s;

    # zones for rate limiting
    limit_req_zone $binary_remote_addr zone=admin_login:10m rate=10r/s; # 10 request a second

    include applications/*.conf;
}

girish

Also, the nginx configs (and certs) are "throw away". The code is written so that they can be re-generated from the configs in the database.

roofboard

@girish
Ok, I restored the ngnix.conf file in the yellowtent then went and moved all the etc/ngnix/applications into a new folder called old. Did a restart and it is still not getting there...

root@my:/etc/nginx/applications# ls
0fa72b5f-441d-4bef-bee3-665f4d85dc3e.conf  4b5dbf96-42b4-4a13-9b9f-15d5228dce9c.conf  a1c46e70-b09e-419f-8461-3e8e40da3870.conf  b3cbed12-eecc-42f2-93ba-b0834a3b3f5b.conf  default.conf
1a907fb3-616a-4b71-930d-c132adc14357.conf  4eaa7fe2-9c72-46c7-946e-f7ed41891a72.conf  a9948920-c8d0-4e14-9139-45ce8a78b549.conf  b892da04-793f-4449-a6d4-ed8564455d46.conf  e67529c6-edb3-47a5-890f-580adc2d7c61.conf
3d520625-8452-4e93-87c7-e03f89e4286b.conf  9cbc7dcd-5202-4e5f-9730-9491d8dc4077.conf  abfd70d6-750a-4621-9072-82da26e9df8f.conf  bdfaef04-4f9d-433e-aaf7-44e6146acb01.conf  my.draglabs.com.conf
root@my:/etc/nginx/applications# sudo mv *.conf old/
root@my:/etc/nginx/applications# ls
old

@roofboard
when i try to start ngnix in one tab, and have journalctl -u nginx -fa in another tab this is the error that I am getting.

Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.```

girish

@roofboard said in out of space error leading to missing certs:

Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)

Some nginx config file is loading this file (it's under /etc/nginx/applications/*, you can move out all files there temporarily somewhere else) . Can you please check which one? That conf needs to be deleted and then nginx has to be restarted. The reason it's not starting is that most likely it is a 0 byte file.

roofboard

@girish
some config file located in the folder /etc/nginx/application ?

If yes then can it see into the old file where i put all the old config files?

roofboard

@girish hmmmm
When I moved the conf files all the way out of the ngnix folder into /old then ran deleted the app config files in /etc/nginx/applications and ran run systemctl restart nginx and systemctl restart box

Then it momentarily started but cloudron would not load, I rebooted and tried to start ngnix using the command systemctl restart nginx

and below is the output from journalctl -u nginx -fa

Jun 03 21:15:12 my.draglabs.com nginx[12053]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 1.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:12 my.draglabs.com nginx[12062]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 2.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12068]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 3.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12070]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12072]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.

roofboard

@girish said in out of space error leading to missing certs:

/etc/nginx/applications/

The Key being referred to is definitely a zero byte file, also draglabs.com is the main domain to which I log in. if it possible that conf is regenerating pointers to the home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": ?

roofboard

@girish said in out of space error leading to missing certs:

e-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app

FIXED!!!

It is difficult to tell if deleting the conf files from the folder /etc/nginx/application and then restarting unbound Instructions then using systemctl restart nginx and systemctl restart box

I say that because unbound definitely was not working at at one point.
And as I remember nginx did start momentarily.

However the solution came when I deleted the corrupted zero byte private key from the folder /home/yellowtent/platformdata/nginx/cert/

When that file was deleted I was able to log in without ssl using firefox. Once in under the domains and certs section of cloudron I was able to click on Renew All Certs. That fixed SSL, and I was able to go into each program and re assign the dns settings by clicking save.

scooke

@roofboard Phew! Good work persisting, and thanks for sharing the solution.

subven

@girish there is no way to trigger certificate renewal over the (SSH) console?

I had a bug (a couple months ago) I never reported where stopped apps did not get a new cert and nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot. Fixed it by just copying over current cert files from working (non stopped) apps. They where obviously non valid for those stopped apps but I was able to start nginx, start the stopped apps and renew their certs.

So in short: Would be nice to have a way to trigger cert renewal over console command and/or extend the troubleshoot guide with cert related stuff.

roofboard

@girish

Also this whole issue was caused by running out of space - I took a look at some of the other posts on out of space crashes and can tell it is a difficult problem to solve.

Supposedly there is a running out of space warning but i never got that warning.

I was thinking that a good solution for the running out of space error would involve taking the remaining space cron which calculates remaining space every 'n' minutes and integrating it over 'x' hours to arrive at time to disk full.

This could relatively accurately predict if an out of space crash is pending or imminent - and if so... do things like stop processes prevent backup (if backing up to local filesystem) etc.

Essentially

predict the crash with a pinch of calculus.
send a warning to the administrator.
follow a contingency to protect the sever.

Because I could imagine many ways this could happen, and my example is ONLY one way. A program can crash Cloudron I could have been copying video files, It could have been NextCloud, a spam attack on a mailserver.

girish

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

girish

@subven said in out of space error leading to missing certs:

nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot.

Yes, indeed, this is a bug. As @roofboard also found out, the code check is a cert file exists but not if it's corrupt. I will get this fixed, so at the very least, restarting the box code will get the dashboard back up.

robi

@girish said in out of space error leading to missing certs:

only every 6 hours or so

The predictive aspect of @roofboard's suggestion is also a good one by tracking a bit of the rate of change, perhaps speeding up in frequency as we approach higher thresholds (>80%+) and slowing down when out of the danger zone(<80%).

Combining this with an email to the admin which is more likely to be seen than a UI notification would be great, until we add the external mobile notification integration via external messaging services.. which is in the pipeline.

mehdi

@girish said in out of space error leading to missing certs:

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

A good shorter term solution would be to allow to configure the level below which the alert is sent. Depending on if you use your server for storing text files, or if you download video, your "low disk" tolerance will be wildly different.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

out of space error leading to missing certs