Using Procmail

From OCF Help

Revision as of 13:58, 13 August 2007 by Sle (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

procmail is the mail-filtering system provided on OCF. It is part of the default spam filtering setup, but it is much more flexible than filtering options provided by any webmail hosting services or popular email clients like Outlook, Eudora, or Thunderbird. You can use the mail-filtering provided by procmail to prioritize your email, such as either filtering away bulk email, so that other personal messages and important email are more visible in your inbox.

Procmail doesn't get used as much as it should be because it's difficult use at first—or as some would like to say, "has steep learning curve". Here are the advantages of procmail that might make you want to learn it:

  1. It's flexible and powerful (more so than any email client)
  2. The filtering is done on the server side: this means that it doesn't matter how you check your email, or where you check it—your email is filtered when it is delivered to your inbox.

This guide has been written to help you get started on procmail without learning all the details of what it can do and how it works. At your convenience, you can lookup the manual pages for procmail by running

man procmail

at your Unix prompt. Please note that following directions should be done at your Unix prompt either after SSHing to OCF or logging in at the console in a terminal of one of the Unix workstations in the lab.

Contents

.forward

First, you will need to configure your personal settings so that all your email is filtered by procmail when it is delivered. If you have spam filtering set up, this part should have been done for you already. If you aren't sure, check the .forward file by running:

cat .forward

The command and the output should look like (where conquest [1] is part of your prompt):

conquest [1] cat .forward
"|procmail #username"

username should be your OCF username. If you don't have any .forward file, then just type into your prompt:

echo "\"|procmail #username\"" > ~/.forward

Remember to change username to your actual OCF username.

If you have a .forward file, and if that's because you have mail forwarding set up, and you still want to filter your email on OCF, then your set up will be slightly more complicated, and we will need to defer more explanation until after we explain how to use procmail, so please read on.

.procmailrc

Procmail's behavior is controlled by a file called .procmailrc. The following is a good default .procmailrc (if you have spam filtering set up, you might have a few extra lines—that's O.K.):

# ~/.procmailrc

SHELL=/bin/sh
PATH=/bin:/usr/bin:/opt/local/bin
MAILDIR=$HOME/Mail
LOGFILE=.procmail.log

:0:
$DEFAULT

You can create this file using your favorite Unix editor. If you don't have a favorite one, nano is pretty user-friendly editor, although if you are considering using Unix a lot, learning vi or emacs can be useful. If you just want to get this done with, you can copy and paste the below commands into your Unix prompt and the correct file will be created for you. Before you run this command, make sure that you don't have a .procmailrc already—if you do (perhaps from when you set up spam filtering), the following commands will overwrite the existing file without warning.

cat > ~/.procmailrc << EOF
# ~/.procmailrc

SHELL=/bin/sh
PATH=/bin:/usr/bin:/opt/local/bin
MAILDIR=\$HOME/Mail
LOGFILE=.procmail.log

:0:
\$DEFAULT
EOF

Here's a bit of explanation:

  • SHELL=/bin/sh specifies /bin/sh as the shell for procmail, if procmail needs to run a shell. Since Bourne-style shells are by far the most common (as opposed to the C-shell, like tcsh on OCF), this is recommended; most examples you might be able to find on the Web are in the Bourne-style shell script.
  • PATH sets up the path for your procmail scripts. This is where we specify where the programs might be found. The PATH given in the above example will work fine on OCF, for most applications. However, you might wish to not set PATH at all and simply specify the full path to any binaries you need to run, so that your .procmailrc doesn't depend on how the PATH is set.
  • MAILDIR tells procmail where you keep your mail folders. OCF's default is $HOME/Mail, as indicated above.
  • LOGFILE tells procmail to keep a log of everything that it does. This is useful in debug problems with your mail filters. Since we've set MAILDIR, the actual logfile is in $HOME/mail/.procmail.log. If you choose to keep the log, you will need to clean it out every now and then, so that it doesn't fill your quota.

Basic procmail usage

The rest of .procmailrc file consists of what is called "recipes", which you can thing of as laundry list of things to do for procmail.

The procmail recipes are read and actions are performed in order. For any given message, when the first delivering recipe is matched, the message is delivered and the rest of recipe are ignored. Delivering recipes are the recipes that have a mailbox or another delivery address as the action. For this reason, for certain recipes, you should put them in the proper order. In particular, above default .procmailrc has a catch-all recipe—that is recipe with no rule at all—that will perform the default action, which, for most people, should be putting the message in their inbox. If you want to add additional recipe, you should add them above the default rule—if not, it won't even be read!

Procmail recipes have the following format

:0 [flags] [ : local lockfile ]
* <(optional) conditions (one per line)>
* <additionaly (optional) conditions>
<exactly one action line>

Each recipe has to start with ":0". This specifies the beginning of a recipe. The flags are single letter codes that modify procmail's behavior for this recipe. You don't have to worry about "flags" and "local lockfile" for now, as they are necessary mostly only for advanced usage of procmail. Any flags that are used in this guide will be explained, but if you want to find out what they do, try "man procmailrc" at your Unix prompt. The conditions start with "*" and are given in extended regular expressions. You can find out more about regular expressions by searching on Google for "regular expressions" or "regex" for short, but for now, all you need to know is:

  • "^" is called an anchor, and it matches beginning of a line. So, "^a" will match "abc", but not "bac".
  • "." is a wild card, and it matches any one single character.
  • "*" means any number (including zero) occurances of previous regular expression be matched. Combined with "." above, ".*" means that it will match anything.

The action is performed only when the message matches all the conditions.

Finally, the action line immediately follows the last condition. Usually, it's just the name of a mailbox, for example,

misc

This delivers the message into the mailbox ~/Mail/misc where you can read it with an email client of your choice, either directly with pine or mutt, or via IMAP with other popular email clients on various platforms. If you want to throw a message away, simply use the mailbox /dev/null, which is the Great Hole in Unix systems—nothing comes out of it, and nothing that goes in there is ever seen again. Lastly, if you want to deliver the message to another email account (say, either on GMail or your CalMail), use the following mailbox (where you replace the email address with your own email address):

! test@example.com

Below is example filtering rules that will filter either based on sender's email address, recipient's email address (this is useful if you want to filter messages from a mailing list, which has the email of the mailing list as the recipient, rather than your personal email address), or both:

:0
* ^TO_mailing-list@lists.berkeley.edu
mailing-list

:0
* ^From:.*spammer@knownspammer.com
/dev/null

In the above example, "^TO_" is a special expression in procmail that means (in English) "This message was sent to this address, it was either in the To, Cc, Bcc, or any other headers that might matter", and the second example is a typical recipe when you want to filter (or discard) messages by the sender.

A very good rule to have early (usually as the first rule) in your .procmailrc is a "safety net" rule. This rule, when placed in the correct position, will copy all incoming mail into a mailbox before you apply any filters or throw anything away. This keeps you from shooting yourself in the foot, especially when you are trying out a new recipe that uses /dev/null to throw messages away. Here's an example

# Safety net!
:0 c
mailbackup

Note that the `#' character is a comment, and like any other script, program, or configuration file, commenting the entries is a good idea so you can remember what each rule was supposed to do in a quick glance. Above rule is an example of use of a flag. The 'c' flag means "copy". Without that flag, procmail would have delivered all mail to mailbackup, since it should go before any other rule, but with the 'c' flag, procmail will copy the message to mailbackup and process the same message through the rest of the rules. If you use such safety net recipe, make sure to check the backup folder every now and then, since it can fill up pretty quickly, especially if you apply the rule before filtering out the spam.

Setting up procmail when you already have a .forward

If you have a .forward already, you can still use procmail by sending the email through OCF's mail system twice in the following way:

  1. Move your current .forward file to .forward+remote (you can choose any name you want, in the format: .forward+??? where you can replace ??? with any alphanumeric characters you would like).
  2. Set up your .forward as described in .forward section by running the command given in the section.
  3. In your .procmailrc, modify the default recipe (i.e. the last recipe with no conditions) so it has the following as action:
    ! username+remote@ocf.berkeley.edu
    where username is your OCF user name and remote is whatever you chose as ??? above.

This is how this scheme works:

  1. Your email arrives at your normal OCF address.
  2. Then, according to your .forward, your email is sent to procmail.
  3. After the filtering is done, your email is sent to username+remote@ocf.berkeley.edu by procmail.
  4. When the email arrives at OCF's mail server again with +remote suffix, it gets delivered according to .forward+remote.

If you have different scheme (for example, your account is a group account with virtual hosting, with special mail delivery option), you might have to edit different files to get this to work right. If you have any questions, email staff@ocf.berkeley.edu.

Advanced procmail usage

There is a lot more stuff that procmail can do than simple filtering, like filtering out duplicate messages, or automatically archiving all of your email into monthly folders, or running HTML email through a de-HTMLifier, or running a virus scanner on all of your incoming attachments, or even helping to block spam. If you want to know more, you can read the man pages (procmail, procmailrc, and procmailex), and check out http://www.procmail.org. The following examples below are only a sample (for your convenience) and is not meant to be comprehensive.

Changing the subject header

You can change the subject header of a message that matches given condition so that when it arrives in your inbox, it stands out (either as important message or otherwise). Here's an example:

:0 fw
* ^From:.*mailing-list@lists.berkeley.edu
|sed -e 's/^Subject:[ ]*/Subject: [LIST] /'

This recipe uses special flags and it's important that you include them—otherwise it will not work as expected. This is what the flags mean:

  • 'f': the pipe will be considered as a filter. That means that the message will be passed down through the rest of recipe until it comes across a delivering recipe that matches.
  • 'w': procmail will wait for the filter (in our case, sed) to finish and check its exitcode. If the filter exits with error (with possibly garbled output), procmail will continue with the original unfiltered input.
  • sed -e: this runs the program sed with the following script. -e option is what tells it to take the next part of the line as the script to be executed.
  • 's/^Subject:[ ]*/Subject: [LIST] /': this will take any line that begins with "Subject:", and changes "Subject:" (together with any spaces that comes right after that) to "Subject: [LIST] ". You should change "[LIST]" to what you want it to be.

de-HTML'ing email

If you use text-mode only email clients, such as pine or mutt, you might have trouble when other users send you email in HTML format, forcing you to either read their email along with the HTML tags, or to launch a graphical email client (which may not even be set up for reading email on OCF) just to read their email. For certain email clients, like mutt, a handier way, such as piping the message to a console-mode Web browser, such as lynx or w3m, exists, but you can also do it automatically using a procmail recipe. Below is an example:

# Create backup for de-HTML'd email
:0 c
* ^Content-type: text/html
de-html-backup

# de-HTML
:0 fbw
* ^Content-Type:[ ]*text/html
| lynx -dump -stdin -nolist \
&& echo "" \
&& echo "=================================================" \
&& echo "This HTML message has been made into text by lynx" \
&& echo "================================================="

There are more elegant and complex way to do it (such as identifying multi-part messages and dealing with them separately), but this is a simple way to deal with messages that are in HTML. A brief explanation of what it does: The first recipe will make a backup of the message before we do anything to it. Then the second recipe uses lynx's -dump option to use the Web browser as an HTML parser, and finally echo commands at the end inserts the notice for the user, so if anyting looks funky, the user can go back to the backup (either using an HTML-enabled email client, or by other methods) to check the original message.

Troubleshooting

  • Q: .procmailrc looks correct but it doesn't work!
  • A: If you edited .procmailrc in Windows, it may have added additional (normally invisible) characters at the end. To fix it, you can use dos2unix. Run "man dos2unix" to find out how you can use it. In the future, it is recommended that you edit all text files (with a few exceptions like HTML files) on OCF using text editors on Unix.
  • Q: I added the LOGFILE line, but I don't see any log file.
  • A: Please note that you need to create the directory for the log file. If you are using the example provided, you should have no trouble; otherwise, make sure that the directory the log should be in exists—it will not be automatically created.

See Also

  • To learn more about procmail itself and other various options, check:
    man procmail
    To learn more about the .procmailrc file, check:
    man procmailrc
    If you want to see a few more example recipes, check:
    man procmailex