Spam filtering

From OCFwiki

(Redirected from Spam Filtering)
Jump to: navigation, search

As you probably know, spam is a serious problem on the Internet. A popular way to combat spam is with a spam filter, which is typically a program that examines your email for characteristics that would indicate it is spam.

SpamAssassin

A mail filter called SpamAssassin is installed on our mail server. Whenever email arrives for you, it is automatically sent to SpamAssassin, which will guess if it is spam or not, based on the contents of the email. It will then "tag" the email with some additional mail headers you can use to decide how to respond. For example, if it looks like the message is not spam, you may see a header like the following:

     X-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,
             HTML_FONTCOLOR_UNSAFE,HTML_FONT_INVISIBLE,HTML_MESSAGE,
             NORMAL_HTTP_TO_IP,WEIRD_PORT autolearn=no version=2.63

This means that the message received a score of -2.6, and that a score of 5.0 is required for the message to be considered spam. Thus, this message is not marked as spam. Below is an example of a message that would be considered spam:

     X-Spam-Status: Yes, hits=29.1 required=5.0 tests=BAYES_99,CLICK_BELOW_CAPS,
             DATE_IN_FUTURE_03_06,DATE_SPAMWARE_Y2K,FORGED_MUA_EUDORA,
             FORGED_YAHOO_RCVD,HTML_50_60,HTML_FONTCOLOR_RED,HTML_FONT_BIG,
             ...

You can set your email client to filter based on the X-Spam-Status header (for example, you can create a filter to direct mail to a mail box designated for spam if the message contains the partial line "X-Spam-Status: Yes". Some email clients nowadays also have built-in spam filtering functionality, which you can use instead of, or even in conjunction with, our spam filter.

Enabling server-side mail filtering

If you don't want to filter the email on your own computer, or you read email directly from our machines using a program like Pine or Mutt, you can have suspected spam delivered to a folder called "spam", for example. Then you can periodically check the spam folder and verify that SpamAssassin didn't accidentally put any valid email there. In practice, this seems to work quite well.

Here is how to redirect suspected spam into a mailbox named "spam". If you don't already have a .forward file, as described below, and you don't already have a .procmailrc (if you don't know what this is, then don't worry about it), then we have a basic script you can use that does this for you. To run the script, log into an OCF machine (using SSH) and run the command below.

spam-setup

(As an alternative, you can run this script from a primitive web interface.) After running this script, suspected spam will be delivered to a folder called "spam", and all other mail will go to your normal inbox. Note that the script will not let you forward mail elsewhere, at the moment. You can add this manually, however. If you need help with this, feel free to email the staff.

Important note: If you opt for this approach, you really should check the spam folder periodically because it is possible that legitimate (and possibly important) mail can find its way in there. Also, spam accumulates faster than you would probably think, and your spam folder can easily consume a large portion of the disk space allocated to you (i.e., your disk quota) if you haven't checked and deleted spam for a couple months. The size of your spam folder counts against your disk quota, and you could start to encounter mysterious problems with your account.

Sample Files

To check whether you have spam filtering set up, or if you want to set it up manually, check the .forward and .procmailrc files to make sure it looks like the following.

.forward looks like:

"|procmail #username"

including the double quotes, where username is your OCF username.

And .procmailrc looks like:

MAILDIR=$HOME/Mail      # Change this if you keep mail somewhere else.

INCLUDERC=/opt/local/environment/.procmailrc.sa
INCLUDERC=/opt/local/environment/.procmailrc.ocf

# Forward mail here:
# :0 c
# !user@host.com

:0:             # These two lines just put the message
$DEFAULT        # in your inbox (the one in /var/mail/).

If you already use procmail, you might have extra stuffs, in that case, you only need to have this line from above:

INCLUDERC=/opt/local/environment/.procmailrc.sa

Make sure to put this toward the beginning of your procmail recipes to make sure that spam gets filtered earlier in the chain. Or, if you want to whitelist certain mailing lists or other email that gets marked as spam mistakenly repeatedly, you can put a delivering recipe matching such mail in front of that line.

Personal tools