Wednesday, July 6, 2016

Making a Mailserver (Part 5) - Wonderful Spam

This is an instalment in my series on setting up a Linux based mailserver. See these posts:

In this post we setup Spamassassin with exim4 to stomp on as much spam as possible. We won't give anything that looks like spam a chance to be delivered, we'll dump such messages before they even complete the delivery process.


And yet it spams

Why Spamassassin?

There are alternatives, a good friend of mine recently recommended dspam to me, so that's on my list to investigate. Spamassassin doesn't catch everything, mostly because some spammers are pretty damn good at their job. It does not do a great job of spotting messages delivering Malware and the tediously regular emails from bearing manufacturers.

If there was such a thing as a perfect solution, it wouldn't be by implementing just one technology. Nothing's perfect. I chose Spamassassin because it is mature, well understood, backed by Apache and easy to setup.

This is Kinda Interesting

For a good number of years I went without receiving a lot of spam. I didn't intentionally publish my email address on the web and many websites take care to obscure your email address, for example, when publishing mailing list archives.

I was intrigued by Steve Gibson's assertion in Security Now #557 that it takes multiple years before your email address really gets on Spammers' radar. He changes his email address once a year:
And something as simple as changing your email address loses spam. That is, it's just gone. And you might think that, oh, it's going to find you again within a week or two. No. It takes, I can attest to this, years, multiple years.
I'd naively held the belief that Spammers found your domain name and then worked through a list of common names to mail to (michael@, john@, chris@, etc). Perhaps they do do that, but it seems that scraping websites and database dumps is the most common and less time-wasting way of building a list of recipients.

In 2015 the level of spam quickly started to get out of control for me. Malware especially was really flooding in and even though I am largely Linux and Android focussed, constantly deleting spam made checking my email a tedious, rather than fun, task.

Plus one for Thunderbird however. I took the time to train Thunderbird's junk mail handling and it is really good. However, as I check email on my phone (most of the time), Thunderbird's junk handling wasn't going to help unless I always had it running in the background on some workstation, somewhere.

The spam was pouring in and since I was redirecting (aliasing via /etc/aliases) some mailboxes to Gmail accounts, I was ending up with a large queue of frozen messages because Gmail was not happy to handle redirected Spam. If you want a Spam free mailbox, there's arguably nothing better than Gmail. I was worried about the potential damage I was doing to my mail server's reputation with Gmail by reflecting Spam straight to Gmail.

Enter Spamassassin.

Exim and Spamassassin Integration

I recommend you first review the debian exim wiki. I did find it to be wrong in places when I referred to it. There was some Exim documentation about acl actions that also informed my config. I'll be providing a multi mail-domain example because I do handle mail for multiple domains. The following example will work for one mail domain or many.

I use exim4 split config files because ... that's what everyone else does.

In brief, the debian exim wiki tells you to do the following:
#apt-get install spamassassin
If you are using Debian Jessie or later (with systemd enabled by default), enable and start the service using systemctl;
#systemctl enable spamassassin.service
On earlier Debian releases, edit /etc/default/spamassassin ...
ENABLED=1 
...and then start the daemon.
#/etc/init.d/spamassassin start
At this point I found divergences between what the documentation tells you to do and what works in reality. The "add_header" did not work for me, following the wiki instructions. Here's how I set it up:

/etc/exim4/conf.d/acl/40_exim4-config_check_data
# warn
#   spam = Debian-exim:true
#   message = X-Spam_score: $spam_score\n\
#             X-Spam_score_int: $spam_score_int\n\
#             X-Spam_bar: $spam_bar\n\
#             X-Spam_report: $spam_report
#
# put headers in all messages (no matter if spam or not)
  warn  spam = Debian-exim:true
      add_header = X-Spam-Score: $spam_score ($spam_bar)
# add second subject line with *SPAM* marker when message
# is over threshold
  drop  spam = Debian-exim
#      add_header = Subject: ***SPAM (score:$spam_score)*** $h_Subject:
Important Points:
  • The debian docs "Subject" manipulation simply did not work for me. Refer to the "Rewriting Subject" section further below.
  • The debian docs used "nobody" as the user, I changed this to Debian-exim. Using "nobody" gets you all kinds of painful log messages.
  • The debian docs used "add_header = X-Spam-Report: $spam_report" on all  messages, this resulted in a message in the header of every email saying that the email had been detected as spam, regardless of the score. 
  • I do still insert the X-Spam_Score in every message.
  • I'm dropping anything over the Spamassassin threshold (required_score). The incoming message will be "rejected after DATA".
You will want to tinker with the threshold on dropped messages. 8 is too high, but it's better to start high and then inspect the score on the spam that makes it through. The bulk of spam appears to get very high scores, but between 4 to 5 there is a crossover between legitimate email and spam.

You also receive spam that gets scores as low as 1 and it's impossible to filter at that level without losing a lot of legitimate email.

You can tinker with the required score in /etc/spamassassin/local.cf and the default at the time of writing is 5, which I think is about right.
required_score 8.0
It's important to remember that the delivery agent is going to get a hard fail when a message scores over the required_score. It probably won't come back for a second try. A slighted mailing list server, for example, may mark your address as a hard fail and remove your subscription.

The rewrite_header in the Spamassassin config is meaningless because Exim is handling the mail and just asking Spamassassin for its opinion on the spam score. Other elements in the Spamassassin file are relevant to scoring the message.

That's it! That's all you need to do.

Rewriting Subject

I didn't implement this because I elected to dump the high scoring messages and write (for every message) the X-Spam-Score to the headers.

Reviewing the Efficacy

You really must spend days or weeks checking in with your Exim logs in addition to reviewing the Spam messages that are slip through.
  • When looking at the spam that hits your inbox, take a look at the X-Spam-Score header that was written in by Spamassassin. View the message source to see the headers. 
  • Don't be confused by  fake headers added by the spammer, such as fake Spam Score information.
  • There is often false information in the headers about being checked by this or that antivirus software. 
  • Message headers should be read from bottom to top. Each mail agent prepends its headers to the top of the message as the message bounces around mailserver to mailserver.
Review the Exim reject log. Here's what you should see when things are working:
# tail -vf /var/log/exim4/rejectlog 
2016-07-06 18:41:41 1bKptM-0006PS-7x H=208-180-142-165.chstcmtk01.com.sta.suddenlink.net [208.180.142.165] F=<xxx@swisslens.com> rejected after DATA
Envelope-from: <xxx@swisslens.com>
Envelope-to: <xxx@moff.tech>
P Received: from 208-180-142-165.chstcmtk01.com.sta.suddenlink.net ([208.180.142.165])
        by moff.tech with smtp (Exim 4.84_2)
        (envelope-from <xxx@swisslens.com>)
        id 1bKptM-0006PS-7x
        for
xxx@moff.tech; Wed, 06 Jul 2016 18:41:40 +0200
  Date: Wed, 06 Jul 2016 14:35:35 -0300
F From: "CamilleHot" <xxx@mndistaog.org>
R Reply-To: "CamilleHot" <xxx@mndistaog.org>
  X-Priority: 3 (Normal)
I Message-ID: <75461.14671320@mndistaog.org>
T To: xxx@moff.tech
  Subject: Come here! I want to make love to you
  MIME-Version: 1.0
  Content-Type: multipart/alternative;
        boundary="605311998876970"
  X-Spam-Score: 18.3 (++++++++++++++++++)

Notice the X-Spam-Score of 18.3 - high but not off the charts. Let's review the kind of scores we've recently seen:
# grep X-Spam-Score /var/log/exim4/rejectlog
  X-Spam-Score: 7.5 (+++++++)
  X-Spam-Score: 7.2 (+++++++)
  X-Spam-Score: 11.8 (+++++++++++)
  X-Spam-Score: 8.2 (++++++++)
  X-Spam-Score: 20.0 (++++++++++++++++++++)
  X-Spam-Score: 14.2 (++++++++++++++)
  X-Spam-Score: 18.4 (++++++++++++++++++)
  X-Spam-Score: 15.4 (+++++++++++++++)
  X-Spam-Score: 5.1 (+++++)
  X-Spam-Score: 16.8 (++++++++++++++++)
  X-Spam-Score: 13.6 (+++++++++++++)
  X-Spam-Score: 6.5 (++++++)
  X-Spam-Score: 14.2 (++++++++++++++)
  X-Spam-Score: 9.5 (+++++++++)
  X-Spam-Score: 18.5 (++++++++++++++++++)
  X-Spam-Score: 18.3 (++++++++++++++++++)
  X-Spam-Score: 6.1 (++++++)
In the period where you're still finding the right score level, I would recommend going back and looking at the headers on the 5.1 message: use "less" to simply view the file and search for the score string.

In fact I discovered, as I wrote this, that the 5.1 in the example above was a legitimate email about a delivery I was waiting on. Oh dear, perhaps I'll nudge the required_score up to 5.1. In my experience, 5.5 is too high.

It's easy to review the scores on the messages in your mailboxes. Scores can actually be negative in number, the lowest I've noticed is -11.89:

Your current inbox:
# grep X-Spam-Score /var/mail/you
Other folders:
# grep X-Spam-Score /home/you/mail/Trash

Keep looking at the Exim logs. Be curious. Learn what the headers mean. Tinker. It's fascinating stuff.

References

http://michaelfranzl.com/2015/02/11/exim-spamassassin-rewriting-subject-lines-adding-spam-score
https://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html
http://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_lists.html
http://www.exim.org/exim-html-current/doc/html/spec_html/ch-content_scanning_at_acl_time.html
https://www.maretmanu.org/homepage/inform/exim-spam.php

No comments:

Post a Comment