Cloudron server restore is "Waiting for DNS of my.<hostname>"
-
The original installation instructions were fairly clear, I agree, and I've already followed them successfully. The documentation overall seems fairly good so far - just perhaps some unusual cases are not dealt with.
What I'm trying to do now is test restoring a back-up of my original installation, which has been used and contains useful content, onto another machine. The documentation I have read says in the section Move Cloudron to another server:
It is recommended to not delete the old server until migration to new server is complete and you have verified that all data is intact (instead, just power it off).
A good recommendation, if a bit vague, but one which suggests to me that the old server will stay intact until the migration is "complete", which I was expecting might then conclude with a step to re-assign the DNS A records, either manually (in my case) or programmatically (if I had been able to use one of the supported domain service providers).
What you seem to suggest is that if I had been using a supported DNS service, the cloudron installer would go right ahead and re-configure the DNS records to resolve to the new host. Although obviously this wouldn't work immediately, because of the propagation delay... Leaving me with two installations, and the DNS records resolving unpredictably to one or the other...
Or worse, if the restore failed at some point after the DNS reassignment, I could be left with the DNS records unpredictably resolving to a broken instance.
Surely that isn't really what anyone would want?
In the case of a manually configured DNS like mine, the restore halts, and waits for some action to be performed for it.
The documentation doesn't warn about this. Instead I encounter a page saying "Waiting for DNS of <whatever>", which despite knowing that at some point I would need to manually re-assign the DNS, leaves me guessing as to what it's expecting. I don't actually want it to switch the DNS out from the old instance yet, and it's a bit alarming that this might have been done without allowing me to check that all is correct, first. This is what made me wonder if perhaps it wasn't the manual DNS reassignment it was waiting for, because surely that would be silly... And so I came here to ask for clarification.
So to sum up: I agree, it absolutely would be useful to be able to fully restore to another instance before switching the DNS. In fact I would argue that's essential for a safe restore process.
I would be interested to know whether this is possible or not - and in any case what I should expect the restore process to do, in what order? @mehdi, is this something you can speak authoritatively on?
-
@wu-lee said in Cloudron server restore is "Waiting for DNS of my.<hostname>":
What you seem to suggest is that if I had been using a supported DNS service, the cloudron installer would go right ahead and re-configure the DNS records to resolve to the new host.
Yes.
Although obviously this wouldn't work immediately, because of the propagation delay...
I believe cloudron configures the DNS with a short TTL, so that propagation does not take long, to avoid these issues. Not 100% sure though.
Leaving me with two installations, and the DNS records resolving unpredictably to one or the other...
The DNS would resolve to the new one, it would not be unpredictable. Why would it?
Or worse, if the restore failed at some point after the DNS reassignment, I could be left with the DNS records unpredictably resolving to a broken instance.
Surely that isn't really what anyone would want?However, if you turn the old instance back on, it would re-configure the DNS on itself, so there would be no problem.
I would be interested to know whether this is possible or not - and in any case what I should expect the restore process to do, in what order?
I do not know of any way of doing what you are asking for, but you can mitigate most of these problems by setting a low TTL time, which would basically remove all DNS propagation problems. Then you can restore to the new instance with minimal downtime.
@mehdi, is this something you can speak authoritatively on?
I cannot speak authoritatively on anything, I'm just another cloudron user
-
@mehdi said:
The DNS would resolve to the new one, it would not be unpredictable. Why would it?
What I mean is, for user A, their DNS server might switch quickly. For user B, it may take the full TTL. Who sees what is unpredictable for me, the admin. I would need to warn users to stay off the instance for the TTL period. In fact I'd probably want to shut off the original entirely.
I do not know of any way of doing what you are asking for, but you can mitigate most of these problems by setting a low TTL time, which would basically remove all DNS propagation problems. Then you can restore to the new instance with minimal downtime.
Ok, it would - but as I say, I would much prefer to know this in advance from reading the documentation.
I didn't see any option to set the TTL to something short, either in the original install or the restore, but then I have to use manual mode. It's possible the installer does this by default, I don't know.
Anyway, thanks, I think is as much as we can say, unless @girish or @nebulon can comment? @girish was quick to correct the documents on the last question I asked about. This is the weekend of course, so I can understand they may be doing other things.
-
@wu-lee From what I understood, there are two broad concerns:
-
In the restore flow, there is no indication anywhere that the IP needs to be switched to the new IP for non-programmatic DNS. That's a good point, I will add this to docs and maybe to the UI as well.
-
You want a restore mechanism which works with the old Cloudron still being active. This is currently not implemented. The general use case is to restore when the old Cloudron is not active/died. While it's technically possible to implement something that have two Cloudrons active at the same time with same domains, it will just create a lot of confusion from UX point of view since we have to add docs to tell people to adjust
/etc/hosts
to really test it. You can open a separate feature request for this and we can see if there is interest.
Thanks for reporting!
-
-
For point 1, I have added a note in https://docs.cloudron.io/backups/#restore-cloudron
-
@girish - thanks, yes, those are my main points.
Regarding point 2. I understand that you want to keep it simple for those who don't want to think too hard, this makes sense.
What I don't understand is why that requires the DNS reassignment to happen before, and not after the basic restore of the software and its state. Is there a reason for that?
If the DNS reassignment happened after the restore, presumably this would
- Still allow the the simple case, where the whole process proceeds (semi-) automatically, but also
- Help ensure the restore fails safely (before the DNS gets clobbered - reassigned to a broken host)
- Allow the final step to be aborted anyway, should the user just be testing the restore works (omitting the DNS step)
- Possibly also make it easier to restore to another domain at the final step (which would be necessary for a full verification that the whole restore process works, in the absence of tweaking
/etc/hosts
)
I do think this is an important scenario, so I may open a feature request about this.
-
@wu-lee At a high level, I agree with your concerns/suggestions. The Cloudron restore flow is built under the assumption the previous server instance is down/not working. With that assumption, there are no real DNS concerns (since it's not pointing to anything valid anyway).
To take a step back, one main case where an old instance is still "active" is I guess when you want to attempt a migration to a new server/provider and make sure it's all good before you turn it off. I think https://git.cloudron.io/cloudron/box/-/issues/602 was one similar issue in the past (and this post but that didn't get a follow up.)
-
I have opened https://git.cloudron.io/cloudron/box/-/issues/737
-
Just wanted to add here, that in such a scenario, there is the time gap between backup the old server, then do a restore, while having the old server active and only later switch to the new server. This for example means that if the old server receives emails or apps are actively used, the newly created one misses that data, which might lead to more problems.
The safest way here is to accept the downtime of apps for the period of restore to ensure data is not lost mid-way.
Now I can see a scenario where the restore is purely made on a recurring basis, just to ensure the backups are valid and one can restore correctly, in which case that data inconsistency is irrelevant.
-
Thanks for answering / ticketing.
@nebulon - I agree that in a server-switch scenario it may be sensible to put the original into "maintenance mode" prior to making the final back-up, to prevent inconsistency between the original and the restored server.
Even so, I'd suggest it would still be simpler and safer if the DNS reassignment is the final step, executed only when I'm ready. If an abort is warranted for any reason, then it would merely be a matter of turning off maintenance mode on the original server, and wouldn't need a step where I or the cloudron installer reverted the DNS state, which could be slow and/or another potential point of failure.
-
@girish I made an update in that git issue you created, as I ran into this situation today where I wanted to test something and didn't want to switch DNS records, but couldn't find a way past this so it was stuck in "Waiting for DNS propagation" state. I looked to follow the git issue you created but I see you closed it just a couple of weeks ago, and I'm not certain why exactly. Not sure if you want to continue this conversation in the git issue or this forum post, but letting you know just in case as this is the first I've also created a statement in a git issue for Cloudron.
-
The documentation states "It is recommended to not delete the old server until migration to new server is complete and you have verified that all data is intact (instead, just power it off)." but this implies that the restore should effectively complete before switching to it, yet the restore requires DNS changes first. So this seems like a contradictory workflow to me.
-
@d19dotca I have to admit, I don't remember why I closed that issue with a cryptic message. I have re-opened it. There is no workaround that I can immediately think of to try to restore a Cloudron with the existing one still running. FWIW, you can always restore apps easily using the import functionality.
-
@girish Thank you for re-opening that. In my particular use-case, I was trying to follow the steps from my VPS provider for extending my disk size (it isn't quite as straightforward as DigitalOcean for example where I can just hit a button and it's done), it's a bit more "old school" with lots of stuff to do in their UI and also at the command line level, but it's certainly do-able. So I wanted to test this out on a new VPS to make sure Cloudron would see the new disk size, etc and walk through the full process that I intended to do at a scheduled time later on where I'd then make a floating IP switch and be done with it. However this test wasn't really completed because of the "Waiting on DNS" stuff in the restore process.
So one thing I could do as a workaround is actually create a VM snapshot of my current one, then use that on a new volume and then make it the boot disk of a new VPS. However, this takes a long time to do and I'd be worried what I'd do for emails and such that arrive at the old Cloudron until the time I've switched it all over to the new one, because there'd be an inconsistency then after an hour or so has passed until I can get it all setup and done.
It's okay though, we can take up that last part in the other forum post I made for it, but that was just my use-case and reason why I was expecting not to have to wait for DNS to propagate so I could actually complete my test and get into the dashboard and verify it sees the extended disk size, etc.
-
Is it possible this can be considered for 6.2, @girish ? I really feel like this is a critical component in order to test backups, setup new instances without flipping the DNS to it yet, etc. Without it, it makes it hard to to proper testing before migrating a Cloudron to a new server, etc. I'd like to do some test runs before the real one when I have a dozen+ clients relying on it (some of which are fairly critical to their business that can't survive too much downtime or flipping between one and back to the other if performance is not as great as expected on the new server for example).
I realize there are some workarounds available in certain scenarios, but they aren't accessible to everyone at any time (i.e. switching my.<hostname>.<tld> to new server to test) if users are needing access to the Cloudron Dashboard, etc.
-
@d19dotca OK, I have scheduled https://git.cloudron.io/cloudron/box/-/issues/737 for 6.2, let's see.