Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Scaling / High Availability Cloudron Setup



  • I use Cloudron as a solution to host multiple blogs for some customers. Given what a powerful solution Cloudron is, one thing I fell that is really missing from the whole package is some way of enabling a degree of high availability for the Cloudron server.

    When I need to resize my server (for increasing CPU, disk space or RAM) or reboot for security updates there is downtime. Additionally, even while migration between Cloudron instances gets good feedback on the forum, an unexpected migration from backup after some sort of failure would incur significant downtime.

    I really don't want any downtime and would like some immediate disaster recovery options if the active server was lost, disconnected or corrupted.

    It would also be nice to know they system could handle increased strain if needed or have the ability to scale up over time rather than having a single server with ever increasing specs (RAM, CPU, Disk) to handle more applications / more traffic.

    It would be amazing if Cloudron had some high availability features that could include:

    • Ability to use floating IPs
    • Load balancing
    • Hot backup / stand by server
    • Ability to connect to managed databases for backend data?
    • Ability to scale based on the number of applications running / resources needed - adding additional Cloudron nodes?

    I know these are all very complex features and I'm not even sure if its possible at all given the way Cloudron works but I was interested to know your thoughts if this had been discussed or touched on your roadmap?

    I'm no expert on the above so apologies in advance, I know I'm covering quite a lot here!



  • This is something I've been batting around implementation ideas on for a little while now. There's a ton of variability provider-to-provider to account for on automating some of it, so I was leaning toward more capable cluster managers that are already available off the shelf. Easily the most capable is Kubernetes, but it comes with a lot of added complexity. It's distinctly possible based on the way Cloudron works to entertain some of this stuff, and I've been sketching out a number of different ideas. Nothing is formally roadmapped afaik right now.

    That said, it would be helpful in thinking about options what you see as the changes in your experience that these sorts of ideas would enable. Adding an additional node, which seems to be what 3 of your 5 ideas are focused on (load balancing, hot stand-by, and auto-scaling), may or may not be the best approach to minimize downtime depending on how the "normal" use would pan out. It's already not all that difficult to keep an alternate machine restored from backups as a standby, but given the way the system handles app-level failures, it's hard to say in what cases that would be useful; there's added difficulty in reversing that failover and keeping it real-time.

    Ultimately, what probably does make the most sense to close the gap on those goals while not messing too much with the underlying architecture and existing packaging is some sort of coordinated cluster manager that keeps the single-container approach but allows the system to reallocate those app containers across k different servers. Something short of Kubernetes could achieve this pretty easily, but will need a lot of work to pull off. For these reasons, I've started looking more at Hashicorp's Nomad as a potential solution to the cluster management side of things, but I'm still in the very early stages of what a Cloudron implementation would look like. At its full potential, this could enable things like multi-region and even multi-provider deployments. Ideally, the details of managing this would be hidden away behind the Cloudron interface, but I've not even yet begun to start spiking out an actual implementation.

    I'd love some more thoughts and feedback on the approach generally though!



  • I would also love to see a HA setup for larger installations (which in my opinion in many cases have the need for some kind of identity provider solution such as shibboleth or FreeIPA for external apps as well though). The Nomad solution looks very promising and could possibly be implemented as a paid premium add-on for larger installations.

    I was personally thinking about a very simple solution for an active-passive setup with just two instances using the snapshotted backups. The backups could be (incremental rsync) replicated to a passive instance that would store them locally for a very quick restore. Incremental syncs would not require much bandwidth or downtime and the restore of locally stored backups would be fairly quick.

    Switching back from the the formerly passive instance to the previously failed of newly setup instance would most likely have to be done manually. A fully automated cluster with recovery would require at least three hosts (quorum) and might be too much overhead for smaller instances.

    Would this be something that could be considered in future developments?



  • @NCKNE Pieces of that definitely could be - I wonder about the appetite for a hot/cold HA standby setup in the community versus an active-active clustered sort of approach. I know for myself, I'm not a big fan of paying for servers to sit there "just in case" as much as I prefer to utilize a little less across more machines and have normal operating headroom with some ability to absorb failures. That's just me though, so the more input on this topic we can get to inform what everyone values, the better!

    Insofar as external app SSO goes, I very much agree that it is an important addition for the future, and I have a somewhat simplified solution that I'm working on (as opposed to the beast that is Shibboleth, since I've looked at packaging it for Cloudron and been...put off by the effort). The drive to do so has also been the thought in my mind that a big part of an app that would allow you to leverage Cloudron as an IdP would be a similar sort of flexibility that makes the rest of the system so strong. I'm aiming for a multi-system IdP app, in essence, which would allow for SAML, OAuth2, and potentially CAS exchanges for a start. I think it would be great to get RADIUS into the mix as well, though that may be better served as its own app. There are some outstanding challenges with the way the Cloudron LDAP system is set up presently, especially with respect to groups, as well as some profile fields, that will need to be sorted out before that's at its full potential, but hopefully we can get a proof of concept available at some point in the near future.



  • @jimcavoli said in Scaling / High Availability Cloudron Setup:

    @NCKNE Pieces of that definitely could be - I wonder about the appetite for a hot/cold HA standby setup in the community versus an active-active clustered sort of approach. I know for myself, I'm not a big fan of paying for servers to sit there "just in case" as much as I prefer to utilize a little less across more machines and have normal operating headroom with some ability to absorb failures. That's just me though, so the more input on this topic we can get to inform what everyone values, the better!

    I am absolutely with you here, having passive servers just sitting there being bored and wasting energy is nothing to aim for. I was just spinning ideas in my head to allow for a quick restore in case of a failure and a simple solution could be to back up to a remote servers disk. Having lots of data (TBs) in apps like Nextcloud and only having full backups made the wish for an incremental backup to a standby location come up.

    Insofar as external app SSO goes, I very much agree that it is an important addition for the future, and I have a somewhat simplified solution that I'm working on (as opposed to the beast that is Shibboleth, since I've looked at packaging it for Cloudron and been...put off by the effort). The drive to do so has also been the thought in my mind that a big part of an app that would allow you to leverage Cloudron as an IdP would be a similar sort of flexibility that makes the rest of the system so strong. I'm aiming for a multi-system IdP app, in essence, which would allow for SAML, OAuth2, and potentially CAS exchanges for a start. I think it would be great to get RADIUS into the mix as well, though that may be better served as its own app. There are some outstanding challenges with the way the Cloudron LDAP system is set up presently, especially with respect to groups, as well as some profile fields, that will need to be sorted out before that's at its full potential, but hopefully we can get a proof of concept available at some point in the near future.

    Wow! That would be awesome and in my opinion a HUGE step for cloudron to become enterprise ready. Together with high availability / load balancing clustering, cloudron could easily be used in larger environment as well.


  • Staff

    @tkd said in Scaling / High Availability Cloudron Setup:

    Ability to use floating IPs

    Note that this is possible already. Get a floating IP and then go to Network view and put the IP there. Cloudron will now use that IP for the DNS. Many users already use it this way with Elastic IP as well.

    Ability to scale based on the number of applications running / resources needed - adding additional Cloudron nodes?

    This is in our radar and definitely doable but the biggest challenge for us has been to justify implementing these features as we haven't found customers who would be willing to pay $ for complex features like these. If you are in the enterprise/medium business bracket and willing to work with us here, please contact us on support@cloudron.io.


Log in to reply