Cloudron and Baserow staying down after AWS EC2 simulated crash
-
Hey there,
With a DevOps colleague, we turned off the AWS EC2 instance on which Cloudron (and Baserow) is deployed to see if it would boot back up. It didn't. Both the my.domain.com (Cloudron dashboard) and the data.domain.com (Baserow) remain inaccessible several hours after our test.
Could someone please:
- Explain why this happened (happy to provide more details if need be)?
- Comment on whether Cloudron should reboot automatically or not?
- If not, detail what steps I should take to guarantee the system remains stable even in case of a crash?
Thank you for your help
-
This is mostly just a wild guess, but isn't this more to do with AWS EC2 not rebooting/ recovering than Cloudron?
I'm pretty sure if I turned one of my Hetzner servers off and then back on again the whole thing would boot up again, including Cloudron and all my apps.
I know nothing about AWS EC2, but have you looked at stuff like this?
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html -
@gabrielle said in Cloudron and Baserow staying down after AWS EC2 simulated crash:
Comment on whether Cloudron should reboot automatically or not?
Yes, definitely, it should all come up automatically.
As for why it's inaccessible, we need more information:
- Can you ping the server? Can you ssh into the server? This will rule out server issues.
- After that, double check if
my.domain.com
resolves correctly to the IP of the server. - Then check nginx and box code. See https://docs.cloudron.io/troubleshooting/#unreachable-dashboard
-
Thank you for your reply
So the instance managed to restart and Docker as well. It looks like it's either Cloudron or Baserow that didn't reboot.
I can SSH into the server.
I doubled checked if my.domain.com resolves correctly and it doesn't.
I checked the status of nginx and it says it's active and running since we restarted the instance.
I don't understand what you mean with "box code"?
I entered: systemctl restart docker but nothing happened...Any clue of what I could be missing?
-
@gabrielle said in Cloudron and Baserow staying down after AWS EC2 simulated crash:
I doubled checked if my.domain.com resolves correctly and it doesn't.
You mean "it does" ?
I don't understand what you mean with "box code"?
systemctl status box
. Does it say active? -
Also, the steps I mentioned are the ones in https://docs.cloudron.io/troubleshooting/#unreachable-dashboard . If you can reach out to support@cloudron.io , we can check directly as well.
-
@girish thx.
So I enteredsystemctl status box
. It said box.service - Cloudron Admin is active. Below it give me the error: Yellowtent : unable to resolve host ip-XXX-XX-XX-XXX: Name or service not known (the IP being my IPv4 address for ens5).I entered
systemctl restart box
but nothing happened.I ran
/home/yellowtent/box/setup/start.sh
and it seemed to have done what it was supposed to.I asked again
systemctl status box
. It said box.service - Cloudron Admin is active. Below it gives me the same error: Yellowtent : unable to resolve host ip-XXX-XX-XX-XXX: Name or service not known (the IP being my IPv4 address for ens5).The my.domain.com still remains unreachable.
Will be emailing you at support@cloudron.io shortly
-
@gabrielle said in Cloudron and Baserow staying down after AWS EC2 simulated crash:
unable to resolve host ip-XXX-XX-XX-XXX: Name or service not known
this error is probably only because your hostname is not my.domain.com (ie your dashboard domain). This is not a fatal error. If you don't want to see that, you can change it with
hostnamectl
via SSH. -
The issue here was DNS related.
my.domain.com
was not pointing to the public IP. -
-