Migration from one server to another with a floating IP and minimizing downtime

girish

@d19dotca need to ssh/run as root instead of yellowtent . The directories contain files with all sorts of mixed permissions.

d19dotca

Omg of course, silly me, sorry. let me try this again with sudo, just need to copy over the SSH keys to root on the new server.

girish

@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:

Maybe this is the issue, that some of these files/directories are owned by something called "tss" instead?

The usernames in containers appear differently as usernames in host. This tss is possibly uid 1000 or something like that on host (check /etc/shadow)

d19dotca

@girish Ah you called it - sorry for forgetting the sudo part, can't believe I forgot that, lol. Been trying too many things today I guess, haha.

Here's the command the seems to work and it's stats output, although this seems maybe incorrect to me given the size of the data and how much needs to be created and such considering the backup I restored from is only 8 hours old or so. However as a percentage compared to how many files exist total it's fairly low what needs to be deleted or created so I guess maybe it makes sense still.

Command: sudo rsync --stats --human-readable --delete-delay --archive --compress --rsync-path="sudo -u yellowtent rsync" -e 'ssh -p {port}' {user}@{IP}:/home/yellowtent/ /home/yellowtent/ --dry-run

Number of files: 555,244 (reg: 512,959, dir: 42,208, link: 77)
Number of created files: 2,308 (reg: 2,213, dir: 95)
Number of deleted files: 702 (reg: 690, dir: 12)
Number of regular files transferred: 11,365
Total file size: 92.88G bytes
Total transferred file size: 18.64G bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 14.17M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 286.61K
Total bytes received: 24.61M

sent 286.61K bytes  received 24.61M bytes  1.42M bytes/sec
total size is 92.88G  speedup is 3,731.42 (DRY RUN)

d19dotca

Ah I figured out why it's so many, when I used the --itemized-changes flag, it shows that a lot of the reasons it's bringing over so many files is due to the timestamp. It seems it's because the timestamp in the source is older than the timestamp in the destination due to the timestamps created on some of these files were the time of the backup restoration time on Server B. Since I'm using the --archive flag, it is trying to keep all of that in sync with each other so that the timestamps on the destination match the source. I believe that's why it's picking up so many more changes than I expected.

d19dotca

Okay, I've migrated servers today and flipped the switch, things seem to be running steady although since I've been combing through the logs I do see a few concerns but not sure if this is related to the migration or not. I'll write up more descriptive tasks I took and details for those who want to attempt this in the future by the way.

I'm seeing this somewhat often in my Mail logs:

Nov 23 17:34:25[ERROR] [224C5AD2-AF35-4EE2-A3F3-05980A9896E2.1] [limit] conn_concur_decr:Error: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

I assume that's the issue perhaps that I hadn't realized was related to the changelog mentioned here?

@girish said in Email Event Log loading very slowly, seems tied to overall Email domain list health checks:

This seems related to the redis issue . Think it gets fixed with https://git.cloudron.io/cloudron/box/-/commit/e64182d79134e8828c2fa953c676a8f6b08247b7

One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption. No matter what I did it wouldn't work so I just ended up backing it up, then removing the files in the `/home/yellowtent/platformdata/mongodb/``` directory and then running an rsync again to bring over the files from the working server, and restarted the MongoDB service and all went well again.

d19dotca

@girish I'm also seeing this in my logs repeatedly for box logs, seems related perhaps to the email hosting part:

Nov 23 17:39:31box:server no such route: GET eventlog?page=1&per_page=20&search=&types=&access_token=<redacted>
[ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client
at new NodeError (node:internal/errors:399:5)
at ServerResponse.setHeader (node:_http_outgoing:645:11)
at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10)
at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12)
at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15)
at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21)
at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28
at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5)
at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13)
at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9
Nov 23 17:39:31box:server no such route: GET solr_config?access_token=<redacted>
[ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client
at new NodeError (node:internal/errors:399:5)
at ServerResponse.setHeader (node:_http_outgoing:645:11)
at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10)
at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12)
at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15)
at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21)
at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28
at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5)
at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13)
at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9
Nov 23 17:39:34box:server no such route: GET usage?domain={domain}&access_token=<redacted>

Any suggestions on this one?

girish

@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:

One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.

yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.

girish

@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:

Any suggestions on this one?

Fix for this is coming next release, you can ignore it.

edit: fixed in https://git.cloudron.io/cloudron/box/-/commit/a056bcfdfe6c7bcb6d2f1cea2017c54f2ba6750f

d19dotca

@girish said in Migration from one server to another with a floating IP and minimizing downtime:

@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:

One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.

yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.

It seemed to work fine but just meant I had to delete the MongoDB files and bring back over from the source server. Everything was good after that. But yeah not the simplest migration process.

It worked though. I’m running fully on the new server since Friday and so far have seen no issues beyond discovering some known ones I hadn’t seen earlier which you’ve confirmed are bug fixes coming soon. The amount of down time was maybe 15-20 minutes (and mostly just for the apps using MongoDB of which I only had 2 apps using it), and that was mostly due to restarting while trying to figure out the solution to the MongoDB errors which once solved the two apps depending on it came back up again and would be this much less downtime next time now that I know how to avoid the MongoDB stuff. For the apps that didn’t rely on MongoDB it was maybe 5-10 minutes. Much better than the 2+ hours of downtime needed doing a normal migration due to backup and restore times with object storage. Going to try and write up some more notes on my experience for others who it may help in similar situations.

ruihildt

Wow, that's super interesting.

Not that I need it, but being able to minimize downtime is always neat.

necrevistonnezr

I just remembered that @fbartels had once published a blog post for this topic:
https://blog.9wd.eu/posts/cloudron-migration/ referenced in https://forum.cloudron.io/topic/5601/moving-cloudron-to-new-server-stop-apps/ and https://forum.cloudron.io/topic/4895/trigger-full-backup-from-cli-scripted-migration/
I don't know if it's still up to date, though.

d19dotca

@necrevistonnezr Ah yes, that has some good stuff too, thanks for sharing that! I think that article's steps still has some long downtime (at least when using object storage since it's so slow) but at least it's a bit more automated which is really cool.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Migration from one server to another with a floating IP and minimizing downtime