Cloudron Forum

chymian

Otherwise I feel we still have to discuss the notification system then to also work for your use-case. Maybe you can start a separate thread describing what your ideal solution would look like a bit and we can see how this makes sense for us to implement this way or not.

yes, I agree, will do that the next days.

chymian

As discussed at other places, I open here a topic, which @nebulon asked me about to do, to find out what kind of Information the admins wish to get informed about and in which time frame.

Based on my experience which was described here and what @msbt described in the same discussion,

I came back from a short trip and thought everything would be okay (since I didn't get any emails that said otherwise) …

I put a few working thesis:

Information is the road to (pro-)actively manage your fleet and deliver services without interruption/minimized down-times.
Information hast to be delivered to the admin in an organized way, so that he/she can react a.s.a.p., but should not be flooded with not system critical info.
Every change to a running system is essential to know about.
Every failed task needs to be reported immediately.

To handle different cases where an admin want every serious incident reported at once, and some admins could change, what they want to get informed about, it maybe would be a forward-locking investment in development, to have a central point of management for the notifications.

as a starting point: organise notifications:

have different severity classes
have different channels like
- different email-addresses
- forward to messengers, telegram or matrix, because they are read/recognized faster
configurable classes/channels relationship
configurable triggers, like on first incident, or only at the third …

before the CLDRN information-stream was cut down, I used sieve-scripts to mimic something like severity-classes. it's somehow cumbersome to manage and had no other channel, meaning also, that if the host's mail system had a pbl, the flow was doomed.

To picture the other side, I am very happy with what LibreNMS has to offer with their customizable alerting system and use that with a telegram robot to inform me about any system-critical information on my fleet at once.

one can read more about it: customizable alerting system

What solution would fit for cloudron, without overaggregating it, but delivers essential info at once and not so critical info to another channel, to have something like a sysadmin's diary?

chymian

I did a bit more research on pixeldroid not able to login, @scooke has mentioned here. the YunoHost community found out, that's another PostgreSQL related bug.

It looks like there are incompatibles piling up for the use of postgreSQL. see here.

the good news:
PF supports LDAP now out of the box, but no new version has been released yet.

chymian

Out of the Blockchain-hype there was storj.io V2 storage-network. a fully encrypted, decentralised storage.

Now, storj.io is moving to V3 - a full S3-compatible network. still decentralised and with high encryption.

Tardigrade is the enterprise, production-ready version of the Storj network, complete with guaranteed SLAs.

Affordable
Performant
Resilient
Secure
Predictable
Intuitive

How it Works

With Tardigrade cloud storage, your files are encrypted and split into pieces client-side before being distributed across our network of high-performance storage nodes.

OSS partners, i.e. nextcloud, minio, cloudron?,

who integrate the tardigrade.io into their sw. they get paid a share on every byte, the enduser store into the storj network!

See how we treat our partners.

Tardigrade, and our open-source parent company Storj Labs are very passionate about supporting, building, and using open source software. It’s this passion that led us to create our Open Source Partner Program. With this program, every time your users store data on Tardigrade, we will share a generous portion of every dollar earned with our open-source partners. Forever. . .

Storagenode Operators:

everybody can run a V3 storj-node, and rent the spare space, one might have. (from RPI upwards).

The V3 Network is in beta, production ready in autumn.

from my experience over the years: very innovative, deliver high quality software, keep there milestone-timetable, good support.
definitely worth a look.

chymian

ok, I formally apologize to everybody who feels offended by me, missing the right tone to express my real points.

it started with the problems I had with backup – backup timeouts/hangs, and not being notified right away.
I just found out about the problems after 2 days, randomly, while logging into cloudron.

looking around in the forum I saw a few topics, which would go into the same direction of not notifying the admin about what's going on on the system (i.e app-updates) - which you can guess right (from all the above), is for me not a sustainable way to manage systems, because information is the road to (pro-)actively manage your fleet and deliver services without interruption/minimized down-times.

on top of that, I stumbled above these announcement for 6.3 that "2FA will be mandatory" for the dashboard", which implements, IMHO, a single point-of-failure: losing access to your phone, in which way that might ever happen, leaving the admin stranded without access to his server.

all these made me – as it is obvious to see, but I didn't realize how much – mad about the "way/policy/direction" which I was seeing unfolding. and which would mean in consequence, that I had to drop cloudron, which I really love because of all the points you guys mentioned - it's a great system for deploying apps easily.
even so, there a a few essential points missing.

so again, please forgive me my tone and offensive wording, it was not meant that way.

kudos to @girish & @nebulon: you created a system, where everybody gets emotional about – means your users love what you have created. but big success also means big responsibility!

with all what happend here brings up the question for an "voting-like-process" where people can have influence on what to come & in what way these are implemented, without spending hours over hours in the forum to pick up a discussion randomly, which might not be doable for everyone?
the keyword here will be Roadmap were the interested one can subscribe.
i.e. like vote on a user-inspired feature to be included for the roadmap, then open discussion on the implementation for both, the user- and dev-inspired features on the roadmap.

chymian

@LoudLemur
thanks for bringing that up. yes tor is badly missing.
additional I suggest the yggdrasil-network (w/wo tor). Even so, they write it's early stages, many of us use it as self-healing ipv6 VPN-style network, which just works.

chymian

hi,
since the latest, update a few days ago to 7.6.1, the mobile newsreader app cannot login to the TTRSS host any longer.
following the hint in the announcment post, I tried to create an app-password to no avail.
TTRSS is not offered in the APP-PW selection dropdown.

APP: TinyTinyRSS
Vers: 1.301. (535)
ID: org.fox.tttrss

chymian

sure,

I was context switching between 2 tasks and missed it completly.

thank you for your timly response.

chymian

@girish yeah, these things happen. for me it's oracle-cloud. they just don't let me in…
since OVH is developing it's website fast, maybe you can give it another try, or if that fails, I can check with support (which got hard to grasp from outside, meanwhile).

chymian

I opend the ticket 3978 yesterday.

I also went back to a 2 week old 2.0.4 backup, which throws the same error?

chymian

may I suggest, that in that – very useful – case, it would also be helpful, to determine different backup-methods per app.
i.e. backing up nextcloud with rsync would make a lot of sense, while backing up git with tar makes also a lot of sense. both would be more gentle with the resources.

edit: spell corrections

chymian

+1
IMHO, since there is the ovh logo as advertising in Cloudron can be installed just about anywhere
on the website, one could imagine that CLDRN also supports easy setup by using their DNS-API ?

chymian

@nebulon, according to the author, @kevinpapst, the plugin is no longer needed and as a left over within the cloudron installation causing problems.

I moved it out of the way and everything is back to normal.

cd /home/yellowtent/appsdata/<UUID</data/plugins
mv RecalculateRatesBundle .RecalculateRatesBundle
or
rm -rf RecalculateRatesBundle

chymian

@timconsidine said in Quo Vadis Cloudron?:

@chymian I don't understand the 'beef' (which probably means that I am a hobbyist not a pro in your books.)

I had the apprehension that I would be totaly missunderstood.

I think Cloudron is great, saves masses of time, provides significant reliability and does most of what I need to run systems.

If I would not consider it a useful product I would not pay for it for over 4 years.

The whole point of Cloudron is that you don't need a sysadmin in the traditional sense. Those awkward people who treat users as idiots of a lower intelligence.

that shows how you think, I don't.

Because I questioned the removal off essential functions and wanted to know, where the way leads us, everybody who does not need these, is in your words a what? I don't have these kind of judgement…

@jimcavoli, @necrevistonnezr
sure, ssh & root-access, but what has that todo with the admin-gui and access to the panel?

The point is, leave the choice to the administrator, not patronizing the way, they manage their servers, whether they want the emails delivered, and have not to login to the system to see whats going on, whether they want 2FA for all and every thing or not.
the whole thing is about choice!

none of you, who where so quick with the answers and flames did understand, what my point is!
and I apologize for not being able to find better words to describe what my point is – english is not my mother tongue.

but none of my questions is answered, only flame and smoke…

chymian

since I retired a dedi-server with a LXC-Minio server, the trouble began.
I had to find a new home for the precious Backups of my customers files & a lot of other static data.

the backup size is only 44G, mainly nextcloud 15GB & syncthing 15GB but also calibre and git with 3GB each.
my retention-period on cloudron is 7 days, one backup a day, sums up to 308G at one point.
the static data is only about 385G.

every solution had some drawbacks.

wasabi: tar-backup was running fine, but with the 90 day retention-policy, I had to pay the price of 5 TB storage usage, meaning there are a lot of data created & deleted. 44GB * 7 days * 4 Weeks is 1232GB usage/deleted every month. for 90 days retention makes up a usage of 3696GB to pay for rotating 44GB only - not a good deal.
a hetzner KVM running Minio with an CIFS-attached storage-box: tried both, rsync & tar and both were running very slow, so that cloudron run into a timeout. Some CIFS-errors in the log lead me to assuming an unstable connection to the storage-box.
latest incarnation is a 1blu.de storage server, a 6 CPU, Raid6-based KVM with enough power and throughput should do the job:
but have the same experience here, backups are very slow, mostly cloudron reports 0MBps. I tried the following combinations: sftp, with rsync or tar, S3 with rsync. tar pending.

moving my data from the hetzner by rsync and/or sftp to the new place, I realized that from both ends with both protocols transfer rates from 10 - 16MBps are achieved, so no throughput-pbl. with the backup-targets.
at the time of writing, the tar to sftp on the 1blu server is still running - since 17h with a 0Bps - transfer rate shown on the sftp-server in the process list of both servers.

backup hangs - nothing is happening! why is there nothing transferred, while the overall machine usage on both systems is very low.

During my excursion I realized, that there is no incremental backup, no deduplication, no verify - it's always a complete simple dump, right?

that brings up the second question, why not use any backupsystem like restic (works very well) or borg (no personal experience) to minimize the overall system load and with their deduplication and verify-features two important functions could be added to cloudron and give the reliability a boost.
also, at least restic has a variety of backends and in combined with rclone even more.
it also can utilize a cache-partition, and also a more fine granulated retention policy per app could be implemented with the btrfs-subvolume described below.

and a another point, is, since these backups can take a very long time, there might be data-inconsistencies, as some of the data might change during that time.

the solution to this could be, to use a snapshot-enabled filesystem which implements a few other serios advantages for cloudron:

while my favorite is ZFS, but that needs a more sophisticated setup and knowledge from the sysadmin, as well as a lot of RAM
btrfs is a simple, straight forward state-of-the-art COW-FS which has block-checksumms against bitrod, snapshots, compression and RAID-Features, build in. to name just a few.
the beauty of it - you can upgrade an ext4 to btrfs - so it's no complicated setup (like with zfs).

advantages of BTRFS-use in cloudron:

easy upgrade from existing ext4 installations
create a subvol for each app-data independently
Subvolume-aware quota support let you set quotas per app
data-consistency for backups: create a snapshot for each subvol for backup
backup the snapshots in background, meaning you can even update apps on the live filesystem/subvol, without waiting for a backup to finish. or, if the update went fine, just delete the snapshot without a need to transfer it to the backup-target in-between.
if an update fails, just role back to the old snapshot – within seconds.
detect and counteract bitrod
upgrade your HW-disk seamless by just adding new disks to the btrfs-pool as RAID-device
in combination with restic, having a subvolume per app would be IMO a clean, structured way to handle the backup!
if an app fails, like we had that case here

these features would give a nice addition to cloudron. and there are more features to it, like send complete or incremental snapshots to a new machine (standby/new host/etc). for a complete list see here

chymian

@nebulon, @mehdi

since the tar-backup to a minio S3 running @ hetzner-VM to a cifs-connected storage-box is veeeery slow (~1MBps), we have to wait for that to finish…

to see the result of a new instance installed, diff. domain with all users allowed:

the new instance on a diff. domain, works as expected:

all users: ok
group only: ok

since all data (>300G) are on external (S3) volumes, which are fstab-mounted onto the system with the fabulous goofys and provided to the app via volumes, I reinstalled the primary instance, same fqdn, same LDAP-group: every seems to work now.

the culprit is left unidentified!
thx for your time & support, guys

chymian

@nebulon said in Backup fails due to long runtime - backup hangs:

@chymian I am not sure about the specific issue and why the upload stalls at some point. This is strange indeed. Is there any throttling happening for incoming connections over time with your latest storage product?

no, it's working very well. did some heavy load test and it behaves very performant.

For the incremental backup, this is only supported using the "rsync" strategy. Depending on the backend, it either uses hardlinks or server side copying to avoid duplicate uploads.

I had made some not so good experiences with rsync to s3, it's more suited to a real FS, or do you have other experiences. what works best with rsync & hardlinks?

Generally our aim is to rather upload more than optimize for storage space as such.

to be on the safe side, I see. but there that also puts more load on the server, which is not really necessary.

We have had various discussions already about using systems like borg backup and such, but so far have always decided that we will not use them, since we are no experts on those systems and we are talking about backups, where it is sometimes require to have deep knowledge about how exactly backups are stored on the backend in order to restore from broken setups (which is of course one of the main use-cases for backups) Problem is, if anything gets corrupted in state with a more complex backup system, it is very or impossible to recover.

I see your point but at some point, an system architect/admin has always go into trust mode and test some new software to develop further, besides that, i.e. restic is battle-proofed.
and with the overall check & repair functions, this could enhance the whole backup-security/relaybility.

Already the encrypted tarball backup has a similar drawback, where say a few blocks of that tarball are corrupted, it is impossible to recover the rest,

that could be seen as a call to find another solution.
as mentioned, I have no exp. with borg, but restic and bareos and these have a validation function – which tar-balls don't have – which gives an extra layer of reliability.

so from our perspective the simpler to understand and recover, the better, with the drawback of maybe using more space or slower backups overall. It is a tradeoff.

philosopher could look horns about that - for sure.
there will be a time, when the old system just cannot keep up with the development and a change is needed.
one point is the pure amount of data which hast to be backed up.

Regarding btrfs and zfs, we actually started out with btrfs long ago.…
However in the end we had to step away from it, since in our experience while it works 99% of the time, if it does not, you might end up with a corrupted filesystem AND corrupted backup snapshots.

I had the same experiences with the early versions of BTRFS, but that's long ago.
meanwhile even proxmox, which is definitely more on the conservative side of system-setups, are using it in PVE 7.x
with a ext4, we will never know about bitrod!

Which is the worst case for a backup system of course. Problem is also that those corruptions might be unnoticed for a long time.

a nightly/weekly scrub can easily find and if possible repair them
but in compare to zfs, a sysadmin had to setup these cron-jobs manually, which most people didn't do/know – me included, which would had saved me some trouble in these days…

Further with regards to filesystems, we have seen VPS provider specific characteristics, which is why we essentially only rely on ext4 as the most stable one.

don't know what you are refering to?
but maybe xfs, which also gained snapshot-features lately, and has a much better reputation then BTRFS, will be a choice.

is cloudron still strictly tied to ext4, or can it be installed on xfs, btrfs on own responsability?

Having said all this, I guess we have to debug your system in order to see why it doesn't work for you.

I figured out at least one point so far:
the server is hosted with ssdnodes.com and the tend to oversubscribe there systems. after monitoring the HD-throughput for a few days, I realized, there are times, when it goes down.
support fixed the for me yesterday, and at least last night the backup run for the first time in a week without pbls.
that leaves the point, why - lets say on an not so highly performance system – the backup stalls completely – after having transferred all data – for hours till it runs into timeout?
the log is here

chymian

another point I want to emphasize:
in this scenario, even if the backup of all apps had been uploaded, the whole backup got thrown away, due to the missing last pieces/stall. its a complete loss of that backup-run.
in a scenario with independent incremental backups per app, at least these could have been saved, and only the last piece where missing.

chymian

@timconsidine
I think your test just reflects the situation when everything is working as supposed.

as I mentioned, to really check the situation when it was not working, one has to throttle a VM's throughput down to 11MB/s, as it was, when the "stalling-behavior" of cloudron's backup was seen. for every other situation, we know it's working.

after I had contact with support and they tinkered with the host, it's also working on my VM as supposed.
but this does not mean, there is not a SW-glitch/race-condition, or something else in CLDRN in that edge-case, which makes backup unreliable.

I'm on the same page as @robi mentioned here

It's about resilience and recovery. It's a software issue.

@nebulon, at the moment I don't have a dedi-host available to replicate that issue, nor that I would have enough time – sry.

but I have little script running per cron every 15min to check the throughput, to have a statistic at hand, when it happens again. In the last 6 days, it shows values between 45 - 85 MB/s, with an average of 71MB/s on 520 values.
which is somehow ok.

cat /usr/local/bin/do_troughput_test

#!/bin/bash

# test disk-trougput with dd
SOURCE=/root/throughput-testfile.img
TARGET=/tmp/test.img
LOG=/root/throughput-statistics.txt

[ -f $SOURCE ] || {
        printf "No source file $SOURCE found, creating one\n"
        dd if=/dev/urandom of=$SOURCE bs=1024 count=5M status=none
        printf " done\n"
}

# clear caches
echo "Clearing cache"
echo 3 > /proc/sys/vm/drop_caches
echo "removing target"
rm -f $TARGET

date | tee -a $LOG
/usr/bin/time -a -o $LOG dd if=$SOURCE of=$TARGET bs=1024 count=5M conv=fdatasync,notrunc 2>&1 |grep -v records | tee -a $LOG

chymian

@girish
opps, my last post was never published…

anycast.me is their geoloaction-aware DNS system.

I opened a ticket with them and after manual pushing DNS updates after disabling DNSSEC (temporarily), it worked and propagated the right entries.

there are still some false positves on spamhouse, but it's working.

Thanks for your support.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.