Migration from one server to another with a floating IP and minimizing downtime
-
@girish Ah you called it - sorry for forgetting the
sudo
part, can't believe I forgot that, lol. Been trying too many things today I guess, haha.Here's the command the seems to work and it's stats output, although this seems maybe incorrect to me given the size of the data and how much needs to be created and such considering the backup I restored from is only 8 hours old or so. However as a percentage compared to how many files exist total it's fairly low what needs to be deleted or created so I guess maybe it makes sense still.
Command:
sudo rsync --stats --human-readable --delete-delay --archive --compress --rsync-path="sudo -u yellowtent rsync" -e 'ssh -p {port}' {user}@{IP}:/home/yellowtent/ /home/yellowtent/ --dry-run
Number of files: 555,244 (reg: 512,959, dir: 42,208, link: 77) Number of created files: 2,308 (reg: 2,213, dir: 95) Number of deleted files: 702 (reg: 690, dir: 12) Number of regular files transferred: 11,365 Total file size: 92.88G bytes Total transferred file size: 18.64G bytes Literal data: 0 bytes Matched data: 0 bytes File list size: 14.17M File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 286.61K Total bytes received: 24.61M sent 286.61K bytes received 24.61M bytes 1.42M bytes/sec total size is 92.88G speedup is 3,731.42 (DRY RUN)
-
Ah I figured out why it's so many, when I used the
--itemized-changes
flag, it shows that a lot of the reasons it's bringing over so many files is due to the timestamp. It seems it's because the timestamp in the source is older than the timestamp in the destination due to the timestamps created on some of these files were the time of the backup restoration time on Server B. Since I'm using the--archive
flag, it is trying to keep all of that in sync with each other so that the timestamps on the destination match the source. I believe that's why it's picking up so many more changes than I expected. -
Okay, I've migrated servers today and flipped the switch, things seem to be running steady although since I've been combing through the logs I do see a few concerns but not sure if this is related to the migration or not. I'll write up more descriptive tasks I took and details for those who want to attempt this in the future by the way.
I'm seeing this somewhat often in my Mail logs:
Nov 23 17:34:25[ERROR] [224C5AD2-AF35-4EE2-A3F3-05980A9896E2.1] [limit] conn_concur_decr:Error: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
I assume that's the issue perhaps that I hadn't realized was related to the changelog mentioned here?
@girish said in Email Event Log loading very slowly, seems tied to overall Email domain list health checks:
This seems related to the redis issue . Think it gets fixed with https://git.cloudron.io/cloudron/box/-/commit/e64182d79134e8828c2fa953c676a8f6b08247b7
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption. No matter what I did it wouldn't work so I just ended up backing it up, then removing the files in the `/home/yellowtent/platformdata/mongodb/``` directory and then running an rsync again to bring over the files from the working server, and restarted the MongoDB service and all went well again.
-
@girish I'm also seeing this in my logs repeatedly for box logs, seems related perhaps to the email hosting part:
Nov 23 17:39:31box:server no such route: GET eventlog?page=1&per_page=20&search=&types=&access_token=<redacted> [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client at new NodeError (node:internal/errors:399:5) at ServerResponse.setHeader (node:_http_outgoing:645:11) at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12) at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21) at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28 at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5) at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13) at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9 Nov 23 17:39:31box:server no such route: GET solr_config?access_token=<redacted> [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client at new NodeError (node:internal/errors:399:5) at ServerResponse.setHeader (node:_http_outgoing:645:11) at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12) at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21) at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28 at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5) at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13) at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9 Nov 23 17:39:34box:server no such route: GET usage?domain={domain}&access_token=<redacted>
Any suggestions on this one?
-
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.
yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.
-
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
Any suggestions on this one?
Fix for this is coming next release, you can ignore it.
edit: fixed in https://git.cloudron.io/cloudron/box/-/commit/a056bcfdfe6c7bcb6d2f1cea2017c54f2ba6750f
-
@girish said in Migration from one server to another with a floating IP and minimizing downtime:
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.
yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.
It seemed to work fine but just meant I had to delete the MongoDB files and bring back over from the source server. Everything was good after that. But yeah not the simplest migration process.
It worked though. I’m running fully on the new server since Friday and so far have seen no issues beyond discovering some known ones I hadn’t seen earlier which you’ve confirmed are bug fixes coming soon. The amount of down time was maybe 15-20 minutes (and mostly just for the apps using MongoDB of which I only had 2 apps using it), and that was mostly due to restarting while trying to figure out the solution to the MongoDB errors which once solved the two apps depending on it came back up again and would be this much less downtime next time now that I know how to avoid the MongoDB stuff. For the apps that didn’t rely on MongoDB it was maybe 5-10 minutes. Much better than the 2+ hours of downtime needed doing a normal migration due to backup and restore times with object storage. Going to try and write up some more notes on my experience for others who it may help in similar situations.
-
I just remembered that @fbartels had once published a blog post for this topic:
https://blog.9wd.eu/posts/cloudron-migration/ referenced in https://forum.cloudron.io/topic/5601/moving-cloudron-to-new-server-stop-apps/ and https://forum.cloudron.io/topic/4895/trigger-full-backup-from-cli-scripted-migration/
I don't know if it's still up to date, though. -
@necrevistonnezr Ah yes, that has some good stuff too, thanks for sharing that! I think that article's steps still has some long downtime (at least when using object storage since it's so slow) but at least it's a bit more automated which is really cool.