Migration from one server to another with a floating IP and minimizing downtime
-
@girish Interestingly, I'm getting close I think but I am getting a lot of these kind of errors in an rsync dry run and this seems to then conclude in a "code 23" error.
The command I'm running is this:
rsync --progress --human-readable --delete-delay --archive --compress --rsync-path="sudo rsync" -e 'ssh -p {port}' {user}@{IP}:/home/yellowtent/ /home/yellowtent/ --dry-run
rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_usermeta.MYD": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_usermeta.MYI": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_usermeta_2014.sdi": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_users.MYD": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_users.MYI": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/mysql/dfe0390f47991409/wp_users_2015.sdi": Permission denied (13) [...] rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/PG_VERSION": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/pg_hba.conf": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/pg_ident.conf": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/postgresql.auto.conf": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/postgresql.conf": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/postmaster.opts": Permission denied (13) rsync: [generator] recv_generator: failed to stat "/home/yellowtent/platformdata/postgresql/14/main/postmaster.pid": Permission denied (13) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1865) [generator=3.2.7]
I assume this has to do with needing to use the yellowtent user or something, but still working on it. Figure I'd give a quick update in case there was some suggestions to get past it.
-
If I modify the command to this, I get a slightly different set of errors but ultimately still permission denied.
Command:
rsync --progress --human-readable --delete-delay --archive --compress --rsync-path="sudo -u yellowtent rsync" -e 'ssh -p {port}' {user}@{IP}:/home/yellowtent/ /home/yellowtent/ --dry-run
rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/e895bea3fd021766" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/f426f5f438dd9395" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/fcd9711342a78f4d" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/mysql" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/performance_schema" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/mysql/sys" failed: Permission denied (13) rsync: [sender] opendir "/home/yellowtent/platformdata/postgresql/14/main" failed: Permission denied (13) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1865) [generator=3.2.7]
-
Btw, this looks wrong to me somehow... shouldn't this all be owned by yellowtent?
ubuntu@my:~$ ls -alh /home/yellowtent/platformdata/ total 112K drwxr-xr-x 21 yellowtent yellowtent 4.0K Oct 12 08:17 . drwxr-xr-x 7 yellowtent yellowtent 4.0K Nov 13 10:01 .. -rw-r--r-- 1 yellowtent yellowtent 5 Nov 19 2022 CRON_SEED -rw-r--r-- 1 yellowtent yellowtent 1.3K Nov 13 10:01 INFRA_VERSION -rw-r--r-- 1 yellowtent yellowtent 5 Nov 13 10:01 VERSION drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 22 10:10 acme drwxr-xr-x 3 yellowtent yellowtent 4.0K Oct 12 08:17 addons drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 25 2022 backup drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 25 2022 cifs drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 19 2022 collectd -rw-r--r-- 1 yellowtent yellowtent 826 Nov 19 2022 dhparams.pem -rw-r--r-- 1 yellowtent yellowtent 6.6K Nov 23 03:53 diskusage.json -rw-r--r-- 1 yellowtent yellowtent 245 Nov 23 21:42 features-info.json drwxr-xr-x 2 yellowtent yellowtent 4.0K Dec 30 2022 firewall drwxr-xr-x 3 tss sgx 4.0K Apr 4 2023 graphite drwxr-xr-x 2 root root 4.0K Nov 22 06:00 logrotate.d drwxr-xr-x 75 yellowtent yellowtent 4.0K Nov 19 17:30 logs drwxr-xr-x 8 tss root 4.0K Nov 23 22:25 mongodb drwxr-xr-x 33 tss sgx 4.0K Nov 23 05:22 mysql drwxr-xr-x 4 yellowtent yellowtent 4.0K Nov 23 10:10 nginx drwxr-xr-x 2 yellowtent yellowtent 4.0K Jul 5 00:37 oidc drwxr-xr-x 3 root root 4.0K Apr 4 2023 postgresql drwxr-xr-x 25 root root 4.0K Nov 19 17:30 redis drwxr-xr-x 3 yellowtent yellowtent 4.0K Nov 19 2022 sftp drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 19 2022 sshfs drwxr-xr-x 2 yellowtent yellowtent 4.0K Dec 1 2022 tls drwxr-xr-x 2 yellowtent yellowtent 4.0K Nov 13 08:34 update
Maybe this is the issue, that some of these files/directories are owned by something called "tss" instead?
-
-
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
Maybe this is the issue, that some of these files/directories are owned by something called "tss" instead?
The usernames in containers appear differently as usernames in host. This
tss
is possibly uid 1000 or something like that on host (check /etc/shadow) -
@girish Ah you called it - sorry for forgetting the
sudo
part, can't believe I forgot that, lol. Been trying too many things today I guess, haha.Here's the command the seems to work and it's stats output, although this seems maybe incorrect to me given the size of the data and how much needs to be created and such considering the backup I restored from is only 8 hours old or so. However as a percentage compared to how many files exist total it's fairly low what needs to be deleted or created so I guess maybe it makes sense still.
Command:
sudo rsync --stats --human-readable --delete-delay --archive --compress --rsync-path="sudo -u yellowtent rsync" -e 'ssh -p {port}' {user}@{IP}:/home/yellowtent/ /home/yellowtent/ --dry-run
Number of files: 555,244 (reg: 512,959, dir: 42,208, link: 77) Number of created files: 2,308 (reg: 2,213, dir: 95) Number of deleted files: 702 (reg: 690, dir: 12) Number of regular files transferred: 11,365 Total file size: 92.88G bytes Total transferred file size: 18.64G bytes Literal data: 0 bytes Matched data: 0 bytes File list size: 14.17M File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 286.61K Total bytes received: 24.61M sent 286.61K bytes received 24.61M bytes 1.42M bytes/sec total size is 92.88G speedup is 3,731.42 (DRY RUN)
-
Ah I figured out why it's so many, when I used the
--itemized-changes
flag, it shows that a lot of the reasons it's bringing over so many files is due to the timestamp. It seems it's because the timestamp in the source is older than the timestamp in the destination due to the timestamps created on some of these files were the time of the backup restoration time on Server B. Since I'm using the--archive
flag, it is trying to keep all of that in sync with each other so that the timestamps on the destination match the source. I believe that's why it's picking up so many more changes than I expected. -
Okay, I've migrated servers today and flipped the switch, things seem to be running steady although since I've been combing through the logs I do see a few concerns but not sure if this is related to the migration or not. I'll write up more descriptive tasks I took and details for those who want to attempt this in the future by the way.
I'm seeing this somewhat often in my Mail logs:
Nov 23 17:34:25[ERROR] [224C5AD2-AF35-4EE2-A3F3-05980A9896E2.1] [limit] conn_concur_decr:Error: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
I assume that's the issue perhaps that I hadn't realized was related to the changelog mentioned here?
@girish said in Email Event Log loading very slowly, seems tied to overall Email domain list health checks:
This seems related to the redis issue . Think it gets fixed with https://git.cloudron.io/cloudron/box/-/commit/e64182d79134e8828c2fa953c676a8f6b08247b7
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption. No matter what I did it wouldn't work so I just ended up backing it up, then removing the files in the `/home/yellowtent/platformdata/mongodb/``` directory and then running an rsync again to bring over the files from the working server, and restarted the MongoDB service and all went well again.
-
@girish I'm also seeing this in my logs repeatedly for box logs, seems related perhaps to the email hosting part:
Nov 23 17:39:31box:server no such route: GET eventlog?page=1&per_page=20&search=&types=&access_token=<redacted> [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client at new NodeError (node:internal/errors:399:5) at ServerResponse.setHeader (node:_http_outgoing:645:11) at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12) at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21) at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28 at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5) at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13) at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9 Nov 23 17:39:31box:server no such route: GET solr_config?access_token=<redacted> [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client at new NodeError (node:internal/errors:399:5) at ServerResponse.setHeader (node:_http_outgoing:645:11) at ServerResponse.header (/home/yellowtent/box/node_modules/express/lib/response.js:794:10) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:174:12) at ServerResponse.json (/home/yellowtent/box/node_modules/express/lib/response.js:278:15) at ServerResponse.send (/home/yellowtent/box/node_modules/express/lib/response.js:162:21) at /home/yellowtent/box/node_modules/connect-lastmile/lib/index.js:80:28 at Layer.handle_error (/home/yellowtent/box/node_modules/express/lib/router/layer.js:71:5) at trim_prefix (/home/yellowtent/box/node_modules/express/lib/router/index.js:326:13) at /home/yellowtent/box/node_modules/express/lib/router/index.js:286:9 Nov 23 17:39:34box:server no such route: GET usage?domain={domain}&access_token=<redacted>
Any suggestions on this one?
-
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.
yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.
-
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
Any suggestions on this one?
Fix for this is coming next release, you can ignore it.
edit: fixed in https://git.cloudron.io/cloudron/box/-/commit/a056bcfdfe6c7bcb6d2f1cea2017c54f2ba6750f
-
@girish said in Migration from one server to another with a floating IP and minimizing downtime:
@d19dotca said in Migration from one server to another with a floating IP and minimizing downtime:
One issue I did run into btw was around MongoDB, where it refused to startup and it kept complaining about possible corruption.
yeah, this is why the rsync solution is not entirely recommended. The databases probably hold state in memory as well . When we do a live rsync when the databases as running, it's possible that we are copying semi-baked stuff.
It seemed to work fine but just meant I had to delete the MongoDB files and bring back over from the source server. Everything was good after that. But yeah not the simplest migration process.
It worked though. I’m running fully on the new server since Friday and so far have seen no issues beyond discovering some known ones I hadn’t seen earlier which you’ve confirmed are bug fixes coming soon. The amount of down time was maybe 15-20 minutes (and mostly just for the apps using MongoDB of which I only had 2 apps using it), and that was mostly due to restarting while trying to figure out the solution to the MongoDB errors which once solved the two apps depending on it came back up again and would be this much less downtime next time now that I know how to avoid the MongoDB stuff. For the apps that didn’t rely on MongoDB it was maybe 5-10 minutes. Much better than the 2+ hours of downtime needed doing a normal migration due to backup and restore times with object storage. Going to try and write up some more notes on my experience for others who it may help in similar situations.
-
I just remembered that @fbartels had once published a blog post for this topic:
https://blog.9wd.eu/posts/cloudron-migration/ referenced in https://forum.cloudron.io/topic/5601/moving-cloudron-to-new-server-stop-apps/ and https://forum.cloudron.io/topic/4895/trigger-full-backup-from-cli-scripted-migration/
I don't know if it's still up to date, though. -
@necrevistonnezr Ah yes, that has some good stuff too, thanks for sharing that! I think that article's steps still has some long downtime (at least when using object storage since it's so slow) but at least it's a bit more automated which is really cool.