Cloudron server restore is "Waiting for DNS of my.<hostname>"
-
@girish - thanks, yes, those are my main points.
Regarding point 2. I understand that you want to keep it simple for those who don't want to think too hard, this makes sense.
What I don't understand is why that requires the DNS reassignment to happen before, and not after the basic restore of the software and its state. Is there a reason for that?
If the DNS reassignment happened after the restore, presumably this would
- Still allow the the simple case, where the whole process proceeds (semi-) automatically, but also
- Help ensure the restore fails safely (before the DNS gets clobbered - reassigned to a broken host)
- Allow the final step to be aborted anyway, should the user just be testing the restore works (omitting the DNS step)
- Possibly also make it easier to restore to another domain at the final step (which would be necessary for a full verification that the whole restore process works, in the absence of tweaking
/etc/hosts
)
I do think this is an important scenario, so I may open a feature request about this.
-
@wu-lee At a high level, I agree with your concerns/suggestions. The Cloudron restore flow is built under the assumption the previous server instance is down/not working. With that assumption, there are no real DNS concerns (since it's not pointing to anything valid anyway).
To take a step back, one main case where an old instance is still "active" is I guess when you want to attempt a migration to a new server/provider and make sure it's all good before you turn it off. I think https://git.cloudron.io/cloudron/box/-/issues/602 was one similar issue in the past (and this post but that didn't get a follow up.)
-
Just wanted to add here, that in such a scenario, there is the time gap between backup the old server, then do a restore, while having the old server active and only later switch to the new server. This for example means that if the old server receives emails or apps are actively used, the newly created one misses that data, which might lead to more problems.
The safest way here is to accept the downtime of apps for the period of restore to ensure data is not lost mid-way.
Now I can see a scenario where the restore is purely made on a recurring basis, just to ensure the backups are valid and one can restore correctly, in which case that data inconsistency is irrelevant.
-
Thanks for answering / ticketing.
@nebulon - I agree that in a server-switch scenario it may be sensible to put the original into "maintenance mode" prior to making the final back-up, to prevent inconsistency between the original and the restored server.
Even so, I'd suggest it would still be simpler and safer if the DNS reassignment is the final step, executed only when I'm ready. If an abort is warranted for any reason, then it would merely be a matter of turning off maintenance mode on the original server, and wouldn't need a step where I or the cloudron installer reverted the DNS state, which could be slow and/or another potential point of failure.
-
@girish I made an update in that git issue you created, as I ran into this situation today where I wanted to test something and didn't want to switch DNS records, but couldn't find a way past this so it was stuck in "Waiting for DNS propagation" state. I looked to follow the git issue you created but I see you closed it just a couple of weeks ago, and I'm not certain why exactly. Not sure if you want to continue this conversation in the git issue or this forum post, but letting you know just in case as this is the first I've also created a statement in a git issue for Cloudron.
-
The documentation states "It is recommended to not delete the old server until migration to new server is complete and you have verified that all data is intact (instead, just power it off)." but this implies that the restore should effectively complete before switching to it, yet the restore requires DNS changes first. So this seems like a contradictory workflow to me.
-
@d19dotca I have to admit, I don't remember why I closed that issue with a cryptic message. I have re-opened it. There is no workaround that I can immediately think of to try to restore a Cloudron with the existing one still running. FWIW, you can always restore apps easily using the import functionality.
-
@girish Thank you for re-opening that. In my particular use-case, I was trying to follow the steps from my VPS provider for extending my disk size (it isn't quite as straightforward as DigitalOcean for example where I can just hit a button and it's done), it's a bit more "old school" with lots of stuff to do in their UI and also at the command line level, but it's certainly do-able. So I wanted to test this out on a new VPS to make sure Cloudron would see the new disk size, etc and walk through the full process that I intended to do at a scheduled time later on where I'd then make a floating IP switch and be done with it. However this test wasn't really completed because of the "Waiting on DNS" stuff in the restore process.
So one thing I could do as a workaround is actually create a VM snapshot of my current one, then use that on a new volume and then make it the boot disk of a new VPS. However, this takes a long time to do and I'd be worried what I'd do for emails and such that arrive at the old Cloudron until the time I've switched it all over to the new one, because there'd be an inconsistency then after an hour or so has passed until I can get it all setup and done.
It's okay though, we can take up that last part in the other forum post I made for it, but that was just my use-case and reason why I was expecting not to have to wait for DNS to propagate so I could actually complete my test and get into the dashboard and verify it sees the extended disk size, etc.
-
Is it possible this can be considered for 6.2, @girish ? I really feel like this is a critical component in order to test backups, setup new instances without flipping the DNS to it yet, etc. Without it, it makes it hard to to proper testing before migrating a Cloudron to a new server, etc. I'd like to do some test runs before the real one when I have a dozen+ clients relying on it (some of which are fairly critical to their business that can't survive too much downtime or flipping between one and back to the other if performance is not as great as expected on the new server for example).
I realize there are some workarounds available in certain scenarios, but they aren't accessible to everyone at any time (i.e. switching my.<hostname>.<tld> to new server to test) if users are needing access to the Cloudron Dashboard, etc.