Spam filter doesn't always learn
imc67 translator last edited by imc67
since weeks I noticed that some spam received in Cloudron mailbox and learned (by moving to Spam in Roundcube) that this spam keeps coming into the Inbox.
This same mailbox is connected to FreeScout and there the spam detection (simply by senders' address) is working.
Is there something I can see/do in Haraka to optimize this?
d19dotca last edited by d19dotca
Unfortunately better spam control isn't here yet, but I sure hope it'll arrive in 6.0 or 6.1.
In the meantime though... from the Cloudron docs they claim it takes about 50+ messages before spamassassin has enough data to work with to "start" to learn. So before it gets "good" I'd hazard a guess that it could take a couple hundred messages (depending on how much spam you get). I get a lot (~10-30/day) and thankfully after a few months of switching to Cloudron most spam gets into my junk box now, but the first few months were bad. lol. This may have been improved in Cloudron updates since back then too.
So I guess my first question to you is... how many messages are currently in your spam folder? If it's not much, this may be why, you'll need more for it to learn from.
Ideally, SA should be catching. But the fact of the matter is that SA gives kinda low visibility into it's learning or how it categorizes mail. I think maybe something like rspamd is better here but it does require more resources to run and also has more packaging requirements than simple SA.
@imc67 if you look into the raw mail headers, it will tell you how it assigned the scoring. Is there anything obvious there?
@girish said in Spam filter doesn't always learn:
rspamd is better here but it does require more resources
rspamd require less training to work, and less complicated configuration setup.
but without clamav is pretty much equal to spamassasin with a little bit of training, and tweak.
AntiSpam in general are really havy to run, and complicated, if you take a plesk or cpanel antispam out of the box are similar to cloudron on.
The only solution is to implement a more complete stack in general, with blacklist, signature, DNSBL in a collaborative way, so that every cloudron can report spam and especially ham.
@imc67 if u want to manually improve you can add barracuda (b.barracudacentral.org) DNSBL to start. (this guide should work; backup your config file before edit it).
The issue of cloudron are mostly resources, because an efficient antispam will take to much ram, and cpu, so we need to find alternative solution, and my company is try to help cloudron to improve on this.
d19dotca last edited by
Out of curiosity, is Cloudron configuring SpamAssassin for BAYES training with sa-learn commands? I tried searching the Cloudron Git but couldn't find any references to it. Perhaps this may be helpful for improving spam filtering with regards to BAYES and such?
From my understanding it dose, but it's not enough data on a single server to actually make a difference.
d19dotca last edited by
@MooCloud_Matt I see the BAYES rules applied but I find that I often have to move spam from inbox to spam or vice-versa and it doesn’t seem to learn very well at all. Makes me question if this can be improved at all. Is the sa-learn command running as a cron for example to force learning at all for inbox and archive folders (rather than only inbox alone for example) as ham messages and then the same for the junk folder for spam messages? Just seems very ineffective, anecdotally, unfortunately.
We got improvement after the 500 mail for day, feed to our ML filter, BAYES normally need less information, but if you don't have enough fresh data it will always be too late.