BAYES_00 rule set for some mail to mailing list (no mailbox)?

d19dotca

Hello! I have a quick question as I continue my quest to improve spam filtering, lol.

Why would BAYES_00 rule be triggered at all for email that is only going to a mailing list where no actual mailbox exists?

Correct me if I'm wrong, but I thought that the BAYES_## rules were applied only from the Bayesian learning system in SpamAssassin which requires a mailbox to learn from, and I thought those were also user-unique not applied system wide. Is it really system wide? Because that'd be the only thing that might explain this.

I came across this when viewing some of the message logs, and you can see the BAYES_00 (which basically means trusted ham not spam) when it's clearly spam (and thankfully still marked as spam in the end):

{
  "ts": 1618563601092,
  "type": "queued",
  "direction": "inbound",
  "uuid": "0FB4AFE9-AF91-4C27-B5DA-8AC7F5861DB4.1",
  "remote": {
    "ip": "204.12.197.146",
    "port": 60548,
    "host": "dns1.nsconsultingservice.live",
    "info": "dns1.nsconsultingservice.live",
    "closed": false,
    "is_private": false,
    "is_local": false
  },
  "authUser": null,
  "mailFrom": "<SRS0=5a24=JN=gmail.com=melissasinopoli121@{customerDomain}>",
  "rcptTo": [
    "<{mailinglistEmailAddress}>"
  ],
  "details": {
    "spamStatus": "Yes, score=17.0 required=5.0 tests=BAYES_00,  \tDKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_ENVFROM_END_DIGIT,  \tFREEMAIL_FORGED_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,MIME_HTML_ONLY,  \tNML_ADSP_CUSTOM_MED,RCVD_IN_GBUDB,RCVD_IN_SORBS_WEB,RCVD_IN_SPAMRATS,  \tSPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no  \tversion=3.4.4",
    "message": "Message Queued (0FB4AFE9-AF91-4C27-B5DA-8AC7F5861DB4.1) (0FB4AFE9-AF91-4C27-B5DA-8AC7F5861DB4.1)"
  }
}

As you can see it must have contained links and such seen in various spam lists as it triggered many of them (GBUDB, Spamrats, SORBS, etc) and has plenty of other spam markers. So thankfully it was still identified as spam. But... why BAYES_00 is triggered for a message that is not going to a Cloudron-hosted mailbox to learn the user preference?

I feel like I'm going crazy, lol. I guess the only explanation is it's system-wide and not user-dependent?

d19dotca

I could still use a hand on this one. It's still happening. Seems like every single message to this address (mailing list) is BAYES_00 in the rules. Yet other mailing lists don't have this from what I can tell. Seems wrong how it's behaving right now.

d19dotca

I wonder if it's related to the defect I filed a long time ago (but hasn't been fixed yet): https://forum.cloudron.io/topic/3312/what-is-the-intention-of-home-yellowtent-boxdata-mail-spamd/24

d19dotca

Okay, I finally resolved this (I think). I guess time will tell for sure.

Yes, it appears it had to do with the defect I had linked to and was reported a very long time ago... https://forum.cloudron.io/topic/3312/what-is-the-intention-of-home-yellowtent-boxdata-mail-spamd/24

Last night, I deleted the accounts in the /spamd/ directory which should never have existed in the first place (but the defect keeps creating them), and sure enough now the BAYES_00 is gone from all emails since the change was made.

To be fair, I suppose BAYES_00 would disappear no matter if it was valid or not in the /spamd/ directory when it's deleted, but since in this particular case this was created for the mailing list when it shouldn't have been created in the first place but was because of the other defect I linked to. So it's possible there are two bugs (or ultimately all maybe stemming from just one bug):

First bug: Cloudron creating mailing list endpoint addresses in the /spamd/ directory. The only accounts listed in that directory should be proper mailboxes, not mailing lists (and especially the mailing list recipients) since mailing lists do not store mail, thus no mail to "learn" from for its BAYES processing for that unique address.

Second bug: Even on a mailing list where messages aren't stored for processing or learning, it seems SpamAssassin eventually tags everything as BAYES_00 anyways. I presume this is because it doesn't see anything marked as spam in the folder so assumes everything delivered was safe? I'm not certain though on that last part.

For context again: I have a user who wants me to host a forwarding address for them to simply pass along the mail and not store it. Thus I created a mailing list for the user, and their @me.com iCloud address is on it as that is the client's preferred way to send and receive mail. Since that means there's no actual mailbox (i.e. nothing to store data inside of), it should never have been created by Cloudron in the /spamd/ directory, especially the actual recipients address from the mailing list.

@girish - I don't like to tag you when I can prevent it, but this post has existed for quite a while without any feedback, and I believe I've narrowed this down now to the defect that was stated in the other post I linked to back in October 2020 which doesn't appear to have been fixed yet. Any insight here to possibly fixing this?

girish

@d19dotca No problem for tagging me I will take a look at this along with other mail related stuff before the next release (i.e so that any required fix is in the coming release).

d19dotca

@girish Just slightly related to the other thread for spam filtering on mailing lists... I looked into one of the "spam" messages which seem to be a false-positive, and I see the following results:

X-Spam-Report: 
	*  1.5 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
	*      [score: 1.0000]
	*  5.0 BAYES_99 BODY: Bayes spam probability is 99 to 100%
	*      [score: 1.0000]
	*  0.5 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
	*  0.5 RCVD_IN_DNSWL_NONE RBL: Sender listed at
	*      https://www.dnswl.org/, no trust
	*      [209.85.216.41 listed in list.dnswl.org]
	* -0.5 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
	*      [209.85.216.41 listed in wl.mailspike.net]

So this is kind of the opposite to some degree now as it's not BAYES_00 but BAYES_99 and BAYES_999 now, but this mailbox was JUST created (less than 24 hours ago) to work around the other issue of spam filtering on mailing lists for this user, so why would already be marked as BAYES99* by SpamAssassin when there's no other reason for it? It isn't like it was moved to the Spam folder from the inbox, for example. If it weren't for the BAYES99* lines, it'd have passed on to the recipient properly. Curious what you think of this.

girish

Not 100% sure but are there any files under /home/yellowtent/boxdata/mail/spamd/<mailboxname> ? Those are the per mailbox spamassassin bayes databases. Maybe it's picking up something from there?

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

BAYES_00 rule set for some mail to mailing list (no mailbox)?