Spam filter doesn't always learn

imc67

Hi,

since weeks I noticed that some spam received in Cloudron mailbox and learned (by moving to Spam in Roundcube) that this spam keeps coming into the Inbox.

This same mailbox is connected to FreeScout and there the spam detection (simply by senders' address) is working.

Is there something I can see/do in Haraka to optimize this?

d19dotca

Unfortunately better spam control isn't here yet, but I sure hope it'll arrive in 6.0 or 6.1.

In the meantime though... from the Cloudron docs they claim it takes about 50+ messages before spamassassin has enough data to work with to "start" to learn. So before it gets "good" I'd hazard a guess that it could take a couple hundred messages (depending on how much spam you get). I get a lot (~10-30/day) and thankfully after a few months of switching to Cloudron most spam gets into my junk box now, but the first few months were bad. lol. This may have been improved in Cloudron updates since back then too.

So I guess my first question to you is... how many messages are currently in your spam folder? If it's not much, this may be why, you'll need more for it to learn from.

girish

Ideally, SA should be catching. But the fact of the matter is that SA gives kinda low visibility into it's learning or how it categorizes mail. I think maybe something like rspamd is better here but it does require more resources to run and also has more packaging requirements than simple SA.

@imc67 if you look into the raw mail headers, it will tell you how it assigned the scoring. Is there anything obvious there?

MooCloud_Matt

@girish said in Spam filter doesn't always learn:

rspamd is better here but it does require more resources

rspamd require less training to work, and less complicated configuration setup.
but without clamav is pretty much equal to spamassasin with a little bit of training, and tweak.

AntiSpam in general are really havy to run, and complicated, if you take a plesk or cpanel antispam out of the box are similar to cloudron on.
The only solution is to implement a more complete stack in general, with blacklist, signature, DNSBL in a collaborative way, so that every cloudron can report spam and especially ham.

@imc67 if u want to manually improve you can add barracuda (b.barracudacentral.org) DNSBL to start. (this guide should work; backup your config file before edit it).
The issue of cloudron are mostly resources, because an efficient antispam will take to much ram, and cpu, so we need to find alternative solution, and my company is try to help cloudron to improve on this.

d19dotca

Out of curiosity, is Cloudron configuring SpamAssassin for BAYES training with sa-learn commands? I tried searching the Cloudron Git but couldn't find any references to it. Perhaps this may be helpful for improving spam filtering with regards to BAYES and such?

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/BayesInSpamAssassin

MooCloud_Matt

@d19dotca
From my understanding it dose, but it's not enough data on a single server to actually make a difference.
Not significantly

d19dotca

@MooCloud_Matt I see the BAYES rules applied but I find that I often have to move spam from inbox to spam or vice-versa and it doesn’t seem to learn very well at all. Makes me question if this can be improved at all. Is the sa-learn command running as a cron for example to force learning at all for inbox and archive folders (rather than only inbox alone for example) as ham messages and then the same for the junk folder for spam messages? Just seems very ineffective, anecdotally, unfortunately.

MooCloud_Matt

@d19dotca
We got improvement after the 500 mail for day, feed to our ML filter, BAYES normally need less information, but if you don't have enough fresh data it will always be too late.

sponch

is the spam filter also trained when moving mails on Apple Mail App e.g. into the spam folder or does this only work via MailCube?

d19dotca

I've been noticing an increase in spam messages lately, and have been trying to determine what else I can do. I've tightened up the rules a bit so less ends up in the inbox (I'll update the other thread with updated rules soon), but also wanting to force learning of SpamAssassin on archive folders for ham and junk for spam to get a more accurate BAYES score for individual users.

d19dotca

@sponch said in Spam filter doesn't always learn:

is the spam filter also trained when moving mails on Apple Mail App e.g. into the spam folder or does this only work via MailCube?

I believe so, yes. Any mail in the spam folder if it's moved from the inbox should be learned as spam, although I think the issue is this doesn't happen reliably, but I don't think it has to do with the mail client in particular, it's more server-side.

nebulon

I also had some pretty annoying spam, which just wouldn't get marked as spam, despite marking the very same mail previously as spam more than once. In my case tweaking the bayes filter worked very well, but your milage may vary (a lot of spam for me is in German)

You can set custom spam assassin rules https://docs.cloudron.io/email/#custom-spam-filtering-rules and there the bayes values like this:

score BAYES_999 2.0
score BAYES_99 4.5

But these values may probably have to be adjusted to your case, so I wouldn't bump up the weights too much to avoid marking ham as spam too often. I hope we can improve on that in the future as having to tweak such rules manually is really not a good strategy.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Spam filter doesn't always learn