Mailcorral Documentation

2. Sendmail Filter

2.1 Description

The MailCorral sendmail filter is a robust virus/spam filter program that runs as part of sendmail (using the milter interface) to filter out viruses and spam from all mail delivered on the site running sendmail. The filter handles all currently known-malicious attachments plus attached and inline HTML. It also handles archiving of received messages, if desired, and can be used to transparently send and receive encrypted email.

This program can be installed as part of sendmail and left to run unattended. It will render harmless any viruses found in delivered email and notify the user inline, in the message itself, of its actions. A backup copy of the unfiltered message is kept for a fixed length of time, just in case filtering rendered it truly unusable.

Spam identification is carried out on two levels. The sendmail filter has a simple spam detection scheme, based on white and black lists, that is always enabled (unless the "-ss" option is set or "SpamFastPath" is set to "No"). You may run the filter with only this mode of spam detection enabled, which, despite its simplicity, is amazingly effective if you work at configuring it properly. Otherwise, more elaborate Spam identification techniques can be applied via a built-in connection to one of the popular spam recognition packages. The spam recognition package is only called if the fast path through the white/black list processing does not recognize a message as being spam, hence spam recognition can be very economical.

Once spam is identified, there are three delivery options which may be selected. Spam can simply not be delivered. Or, it can be delivered with a marker in the subject to identify it as such. Finally, in lieu of being delivered, it can be redirected to a corral where it can be released at a later time. In this delivery mode, as in the other two, spam handling runs completely unattended by your system administrators. If you use the optional spam handling package SpamCorral, users can be automatically sent periodic summary messages which will allow them to release for delivery only those pieces of spam that they actually want to see. The rest of the spam is deleted after a short holding period.

Virus identification is also carried out on two levels. The sendmail filter has a simple virus detection scheme, based on lists of acceptable attachment types and MIME types, that is always enabled (unless the "-vv" option is set or "VirusChecker" is set to "No"). You may run the filter with only this mode of virus detection enabled, which, also despite its simplicity, works quite well. Otherwise, more elaborate Virus identification techniques can be applied via a built-in connection to one of the popular virus recognition packages.

2.2 Features

Here is a list of what we consider to be the most useful features of the product. What is left unsaid is that it is basically install and forget. Once you set it up and it is running to your satisfaction you should not normally ever have to touch the filter again.

2.3 How it Works

This program is invoked by the milter interface of sendmail, upon startup. It registers callback routines for the various message processing actions, as defined by the milter interface. Upon return to sendmail, the callbacks are called by sendmail whenever it has messages to deliver. The callbacks filter the messages for noxious entities (e.g. viruses) and warn the user of their presence by inserting warnings into the body text of the message. In some cases (e.g. attachments), the offending entities are altered or deleted to render them harmless.

Basic spam checking is undertaken by MailCorral using white and black lists to compare against the sender and recipient of a message. This basic technique can be effective against spam that is always sent from the same domain. If attention is paid to how the lists are set up, one can arrive at a simple yet functioning spam filter. Regardless of what other spam checking is done, this basic checking is always done (unless the "-ss" option is set or "SpamFastPath" is set to "No") whether to provide the only means of spam identification or to provide a fast path, in front of a spam arbitration daemon, that improves performance.

A second built-in spam check is done by performing a statistical anyalysis of the text of the message, looking for certain features that spammers are known to employ (mainly to see if they can circumvent spam filters). This check is done essentially at no cost, while the message is being otherwise processed and can provide a second fast path that avoids the need to make an expensive call to a spam arbitration daemon.

Further spam checking can be enabled by informing MailCorral about a spam arbitration daemon such as SpamAssassin. If this is done, a request for spam determination is sent to the arbitron, via a pipe, which includes the headers and all text portions of the message (binary file attachments are omitted).

If either the built-in white/black list processing or the arbitron determine that the message is spam, it is disposed of according to the disposition options chosen. Three choices are available: deliver the spam, suitably tagged as such; uncerimoniously place the spam in the trash can; divert the spam to a corral where it can be remailed to the recipient at a later date.

The first two disposition options are self-explanatory. The third option causes the spam not to be sent directly to the recipient but instead written to a corral directory. The corralled message is given all of the usual filtering, before it is corralled, hence it is ready for immediate remailing, should the recipient so decide (see SpamCorral).

Basic virus checking is carried out by MailCorral using a set of lists that identify file attachment types and MIME types that are known to be potentially harmful. Depending on the type found: nothing is done; a warning is given; the attachment is renamed to prevent it being opened; or, for known really nasty items, the attachment is deleted. This basic technique can work quite well if the users are careful and pay attention to the cautionary messages inserted. Regardless of what other virus checking is done, this basic checking is always done unless the "-vv" option is set or "VirusChecker" is set to "No".

Further virus checking can be enabled by informing MailCorral about a virus arbitration daemon such as ClamAV. If this is done, a request for virus determination is sent to the arbitron, via a pipe, which includes all attachments and inline components where a virus could lurk. If it determines that the attachment or component is infected, it is deleted by MailCorral and the recipient is notified of the fact. This ensures that the user cannot inadvertently open a malicious attachment, regardless of how disinclined they are to heed warnings.

Basic archiving of received messages is provided by the filter. It can create and manage an archive directory on disk that contains some or all of the messages received, depending on its configuration options. A single file is created for each message and a directory structure is built that allows thousands or even millions of messages to be stored and easily managed. An exact copy of each message, before filtering, is archived.

Support for third party archive programs is also available. If requested, the filter can email an exact copy of each received message to a third party archiver through normal email channels. This allows the archiver to reside on the local system or elsewhere. Further support for such archivers is provided through the filter's capability to delete unwanted bounce messages from third party archivers, should they becoume unavailable due to problems such as network failures.

A connection to the open source privacy guard library GnuPG allows the filter to transparently encrypt outgoing messages and decrypt incoming messages using the user's key ring. This allows encryption/decryption to be used even with mail readers that don't directly support it.

To do all of the above, this program must be used in conjunction with a version of sendmail that has milter support and must be set up in the sendmail config file to be called by milter. In addition, spam recognition requires the use of a spam arbitration daemon (unless you wish to insert your own code into the filter) to decide which messages are, indeed, spam.

2.4 Filtered Items

MailCorral tries to disable all harmful items found in the email messages it processes, whether they be attachments or objects embedded directly in the messages. The disabled items, such as viruses and other malicious bits of executable code, can often damage or destroy any system that receives the messages. Disabling these items prevents them from having their otherwise disastrous effect.

For a complete list of all items removed, see the filter tables in the program code (smfopts.h). The synopsis is:

Note that the messages that apply to each of these tests are found in the same file as the tests themselves (smfopts.h). The English text of each message is pretty self-explanatory. Furthermore, each of the messages may have its text overridden in the global and local configuration files, as well as by the foreign language options. The configuration file documentation has a description of each message and its default text.

Please bear in mind that, while every attempt was made to create code that would nullify all of the known malicious items at the time that MailCorral was written, virus writers and their ilk are very creative. As the, dare we say art, of virus writing progresses, new and improved viruses may determine ways to get past MailCorral. However, just as sex without a condom is more dangerous than with, email without a filter is more dangerous than with. MailCorral is sure better than nothing (some testimonial, huh).

In an effort to ensure that as much nasty material as possible is caught, MailCorral is designed to pass the validation suite that BSM Development offers elsewhere on this Web site. If you think you've found a virus that isn't caught by MailCorral, please send it to us and we'll update the validation suite as well as the filter. This will not only help MailCorral to filter out all of the latest virus "technology" but also assist anyone else needing to verify that their mail filter is up to date, since the validation suite is made available to everyone.

If you are sending us a sample, you could try emailing the virus directly to BSM Development but this is unlikely to work if our virus filters are alert. Instead, an approach that obfuscates the true nature of the message should be employed. My preferred method is to edit the mail inbox with a text editor and copy the entire message containing the virus, from the first header to the last line of the MIME attachments. Save this entire message, headers and all, in a text file and then zip the text file. Attach the zipped file to the mail message that you send to us.

Many people may find the filter criteria employed by MailCorral to be fairly harsh (e.g. we often receive complaints asking why HTML messages are tagged and altered). Although we spend much time evaluating viral payloads and are constantly trying to eliminate noise filtration, we realize that some of the items the filter catches may be viewed by the man in the street as acceptable. In MailCorral's defence, most of the parameters chosen for the filter are there because actual viruses have been observed using the filtered items as delivery mechanisms. You are more than welcome to tune the filter parameters but don't do so lightly. Turning off any of them will certainly increase your risk of exposure and it may only be a matter of time before a virus slips through the sieve. Weigh the annoyance factor carefully against the cost of restoring a system damaged by a virus. We believe that being overly cautious is much better than being careless, in this case.

Finally, if you use a signature based virus arbitrion (such as ClamAV), you may be able to relax the built-in criteria that MailCorral uses and rely on the fact that the arbitron will reliably find all of the viruses but with less false positives. This should prove less intrusive to your users. On the other hand, there's something to be said about the belt and suspenders too approach, since we've found that the users will be even more upset when a virus gets through, abuses their email address book, steals their credit card numbers, signs them up for membership on several spam blacklists and then deletes all of their data.

2.5 Message Remailing

When a message is altered to remove a harmful item, an unaltered copy of it is kept in the mail corral for a short length of time. If a message was altered in such a way as to render it unusable, it can be retrieved in its pristine state. Optionally, a message handling robot can be set up that will allow the recipients of altered messages to retrieve them from the corral and remail them directly to themselves in unaltered form.

Normally, this option is not enabled, since it is dangerous to allow possible viruses to be delivered to the recipient without filtering. The usual method of operation is to have the recipient request a Tech Support person to deliver the message. This allows Tech Support to peruse the message and ensure that it isn't harmful before sending it on its way.

In certain high volume mail delivery situations, however, it may not be desirable to require such human intervention, given the quantity of mail that may be involved. In these situations, automated delivery of the recipient's unaltered mail may be preferable, providing all concerned are made aware of the danger of releasing live viruses from the corral.

The message remailer robot is invoked by the recipient of the message through a mail message sent to the robot. The altered message contains instructions on how to mail the robot as well as a link which many mailers will follow to instantly create the required mail message. Once the mail is sent to the robot, it responds by sending the unaltered message.

2.6 Spam Handling

Enhanced spam handling (other than the basic black/white list processing built in) is carried out in cooperation with a spam arbitration daemon (arbitron), such as SpamAssassin) or one or more of the user's programmable arbitrons. MailCorral prepares a representative message that contains all of the components of each received message that are important from the standpoint of spam arbitration. It then sends a request for spam determination, via a pipe to the arbitron, which contains the representative message and the arbitron renders a decision as to whether the message is spam or not. The arbitron's decision is final. Or, in the case of the user's programmable arbitrons, the message is passed in a file but the rest of the details are the same.

The received message is processed to create the representative message by first removing any attachments and inline components that are not directly viewable by typical mail readers (e.g. not text/plain, text/html, etc.). This is done for performance reasons, since large attachments do not contribute measurably to the spam determination process but they would normally require transmission to the arbitron. Then, any MIME entities that are encoded are decoded so that the arbitron is dealing only with plain text.

Prior to invoking the spam arbitron, the user's .sendmailfilter configuration file plus the local and global Spam Assassin (if the spam arbitron chosen is spamd) configuration files are read to extract white and black list information for the recipient of the message. This information is then acted upon to determine if a message is spam. Next, if the white and black lists don't result in a spam/non-spam determination, statistical information (see "Statistical Tests", below) about the message is examined to look for spam-like qualities that indicate the message is spam. If a determination can be made about a message without invoking any arbitrons, this is done to speed up spam processing, since calling spam arbitrons is generally slow.

Upon detection of a message's sender on a black list, the detection of spam through statistical methods or the receipt of the arbitron's decision, the message is disposed of according to the options "-sc", "-sd" and "-st". If the "-st" (trash) option is chosen, the message is dumped and that's that. If one of the other two options are chosen, the results from black list processing or from the arbitron are inserted into the message, depending on which of the three options "-s1", "-s2" or "-s3" are chosen.

Message formatting for level one (command line "-s1" or config file "SpamLevel 1") consists of inserting a single header into the message, giving spam percentage values (any message with a value over 100% is considered to be spam) and inserting a paragraph into the message body, explaining that the message is spam (this is all that will be done for a black listed message, regardless of what message formatting level is chosen). Level two formatting (command line "-s2" or config file "SpamLevel 2") adds any pertinent headers generated by the spam arbitron to the message as well. Level three formatting (command line "-s3" or config file "SpamLevel 3") inserts a report, if any, from the spam arbitron into the message body, as a paragraph, following the explanatory paragraph inserted by level one.

If the immediate delivery option ("-sd") is chosen, the spam is tagged with a subject prefix of "[SPAM]:" and sent on its merry way, after any virus filtering, etc. is done. If the corral option ("-sc") is chosen, the spam is filtered and prepared for delivery and then sidelined in a corral directory where a program such as SpamCorral can find it and send out receipt notifications.

Meanwhile, sendmail is instructed to reply to the sender or not, depending on the value of the "-r" parameter. A value of zero causes no replies to be sent to the sender. A value greater than zero causes no more than that many replies to be sent to the sender in a predetermined period. After that threshold is reached no more replies are sent until the end of the period passes. If a reply is sent, it has a major and minor result code of "550" and "5.7.1" plus a suitable message text that says the message is spam. Perhaps it is a bit naive to expect that spammers will do anything about this kind of deliver notification so this feature is off by default.

2.6.1 Statistical Tests

The statistical tests that are applied to messages to render a spam determination are fairly simple and, hence, quite rapid and yield a very high true positive score when applied to messages sent via email. They rely on the fact that normal email messages do not employ many of the tricks used by spammers to circumvent spam filtering. By employing these tricks to get around content-based filters, the spammers are essentially flagging their email messages as spam (how thoughtful of them). Here is a description of the tests applied (all tests are applicable to HTML messages only):

Content-based spam identifiers (e.g. Bayesian) look at the words in a message to identify whether it is spam or not. Spammers insert HTML comments into the middle of words (e.g. Via<!--junk-->gra) to break the words up so that content-based identifiers will not see the trigger words that indicate spam. Regular email messages never insert comments into the middle of words so a ratio of the number of embedded comments to words gives a good indication of spamishness. Even a very small ratio is an excellent indicator.

Spammers use tables extensively, even going to the point of aligning single characters in table colums to create words that are not detected by content-based spam identifiers. Regular email messages do not use tables to the extent that spammers do so a high ratio of table tags to regular words can indicate spam.

Image spam presents a message through an image that is loaded from a Web server, thereby resulting in a message with zero or almost no text content, hence nothing is available for filtering. To a lesser extent, text-based spam (which is basically advertising) employs images to present a more visual message, since visual messages are apparently more appealing. Regular email, although it often includes images, seldom has these images embedded inline in a message and does not include inline images in high proportions to text. Hence, a high ratio of inline images to regular text is a good predictor of spam.

The strength of a good spam campaign can be automatically judged by embedding links to feedback Web sites in such a manner that, when an email message is opened, the Web site is accessed and the recipient's identity is transmitted. In this way, a spammer can know who has received their spam and opened it. Links of this nature are embedded into a message as innocuous images. However, real images seldom include email addresses or parameters containing identifiers. Thus, any images that include such information should receive a high spam score.

Similarly, links to Web pages that include email addresses as parameters (not "mailto:" type links but links to a CGI script or Web page) probably indicate a spammer attempting to provide feedback to themselves with the recipient's email address. Consequently, these types of links are assumed to be indicative of spam.

Perhaps there are legitimate reasons to encode plain text and HTML as if it were binary data (e.g. alternate character sets) but the most common use of this technique has been to obscure spam and viruses from detection by scanners. This being the case, any text or HTML MIME entities that are encoded Base64 are given a high statistical spam score which, while not high enough to win a message a spam label all on its own, it is high enough to ensure that any other indiscretions will push it over.

Note that the final statistical score for a message is the sum of all of the statistical tests that are enabled. These tests are employed because they often provide a rapid determination of spam and can work very well, in many instances. Mind you, not everyone is bound to agree with these statistical definitions of spam so any and all of the tests can be disabled or adjusted to fit individual preferences. The statistical filtering criteria can be tuned by altering the values assigned to the "StatXxxx" options in the configuration file (see the "Spam Processing Options" section for more information on these options).

2.6.2 Spam Report

If a message is found to be spam, information to this effect is added to the message before it is sent on to the user (or corralled in the corral). The text that is added to the message is governed by the values that you set in smfopts.h, as well as settings in the system and user configuration files. However, for the purposes of this section, we will assume all of the standard settings are kept.

If message formatting is set to level one, either through the command line parameter "-s1" or the config file setting "SpamLevel 1", the spam report consists of inserting a single header into the message, giving spam percentage values (any message with a value over 100% is considered to be spam) and inserting a paragraph into the message body, explaining that the message is spam (this is all that will be done for a black listed message, regardless of what message formatting level is chosen). Here is an example:

 
To: jdoe@anon.com
Subject: A hard man is good to find
Date: 2009 Jan 17
X-Spam-Stats: Local 0%, System 0%, Scanner 150%, Score 150%.

This message matched the criteria for spam, determined by your personal
spam filter parameters or the global or system spam filter parameters.

The regular body of the message follows....

Note that the percentage values in the X-Spam-Stats header represent the results of the various tests against the message, as follows:

Local The result of applying the user's white and blacklists. If the blacklist matches, the value is 100%. Otherwise, it is 0%. Generally, unless modified by a special "SpamRule", a hit on a blacklist results in all other checks being bypassed, as does a hit on a whitelist.
System The result of using Mailcorral's built-in spam scanners (e.g. statistical scan).
Scanner The result of scanning the message with any third party (e.g. spamd) or programmable arbitrons. The value returned by the arbitrons is scaled so that a determination of "Spam" results in a score of 100%.
Score The maximum of the other three spam values or, in other words, the score that got the message flagged as spam or not.

If message formatting is set to level two, either through the command line parameter "-s2" or the config file setting "SpamLevel 2", the spam report consists of inserting the same header as above into the message, along with any headers returned by the arbitrons, and inserting a paragraph into the message body, explaining that the message is spam. Here is an example:

 
From: Joe Blow <joe@blow.com>
To: jdoe@anon.com
Subject: A hard man is good to find
Date: 2009 Jan 17
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
    space-port.homeworld
X-Spam-Level: *******
X-Spam-Status: Yes, score=7.5 required=5.0 tests=AWL,DRUGS_ERECTILE,
    HTML_MESSAGE,IMPOTENCE,MIME_HTML_ONLY,NO_RELAYS autolearn=no version=3.2.5
X-Spam-Report:
    -0.0 NO_RELAYS              Informational: message was not relayed via SMTP
     2.6 IMPOTENCE              BODY: Impotence cure
     0.0 HTML_MESSAGE           BODY: HTML included in message
     2.3 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
     2.4 DRUGS_ERECTILE         Refers to an erectile drug
     0.2 AWL                    AWL: From: address is in the auto white-list
X-Spam-Stats: Local 0%, System 0%, Scanner 150%, Score 150%.

This message matched the criteria for spam, determined by your personal
spam filter parameters or the global or system spam filter parameters.

The regular body of the message follows....

If message formatting is set to level three, either through the command line parameter "-s3" or the config file setting "SpamLevel 3", the spam report consists of all of the above plus any reports returned by the arbitrons, inserted after the paragraph that is inserted into the message body. Here is an example:

 
From: Joe Blow <joe@blow.com>
To: jdoe@anon.com
Subject: A hard man is good to find
Date: 2009 Jan 17
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
    space-port.homeworld
X-Spam-Level: *******
X-Spam-Status: Yes, score=7.5 required=5.0 tests=AWL,DRUGS_ERECTILE,
    HTML_MESSAGE,IMPOTENCE,MIME_HTML_ONLY,NO_RELAYS autolearn=no version=3.2.5
X-Spam-Report:
    -0.0 NO_RELAYS              Informational: message was not relayed via SMTP
     2.6 IMPOTENCE              BODY: Impotence cure
     0.0 HTML_MESSAGE           BODY: HTML included in message
     2.3 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
     2.4 DRUGS_ERECTILE         Refers to an erectile drug
     0.2 AWL                    AWL: From: address is in the auto white-list
X-Spam-Stats: Local 0%, System 0%, Scanner 150%, Score 150%.

This message matched the criteria for spam, determined by your personal
spam filter parameters or the global or system spam filter parameters.

Spam detection software, running on the system "space-port.homeworld", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
postmaster for details.

Content preview:  Untitled Document Herbal Alternative for Erectile Dysfunction
Men of Iron has been featured on over 100 TV News and Top Radio stations
across America, and we know why... It REALLY works! Visit Our Web Site Click
Here: Learn about our special offer! [...]

Content analysis details:   (7.5 points, 5.0 required)

pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 2.6 IMPOTENCE              BODY: Impotence cure
 0.0 HTML_MESSAGE           BODY: HTML included in message
 2.3 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 2.4 DRUGS_ERECTILE         Refers to an erectile drug
 0.2 AWL                    AWL: From: address is in the auto white-list

The original message was not completely plain text, and may be unsafe to
open with some email clients; in particular, it may contain a virus,
or confirm that your address can receive spam.  If you wish to view
it, it may be safer to save it to a file and open it with an editor.

The regular body of the message follows....

2.7 Virus Handling

Enhanced virus handling (other than the basic checking of file type and MIME type lists that is built in) is carried out in cooperation with a virus arbitration daemon (arbitron), such as ClamAV) or one or more of the user's programmable arbitrons. MailCorral selects all of the attachments and inline components of each received message that are likely places for a virus to lurk. It then sends a request for virus determination, via a pipe to the arbitron, which contains each of the attachments and/or components and the arbitron renders a decision as to whether any of them conatins a virus or not. The arbitron's decision is final. Or, in the case of the user's programmable arbitrons, the attachments and/or components are passed in files but the rest of the details are the same.

After all virus checking (either internal or external arbitron or both) is done, any attachments or components marked as viruses are deleted, while the others may be marked with a warning or have their names permuted to prevent them accidentally being opened/lauched. An unaltered version of the message is placed in the corral where it can be retrieved later on, if necessary.

2.8 Command Line Parameters

This filter is actually invoked by the system startup script for sendmail. It is a daemon which listens, on the port specified, for milter requests from sendmail. The command line parameters that are passed to the filter from the startup script (see Section 1.5) govern the filter's actions, plus define the communications connection with sendmail and the message type arbitrons. They are:

-A Supplies the name of a file or list of files, separated by semi-colons, that contains the name of all of the local aliases, for the purposes of allowing the filter to determine whether local mail is actually deliverable or not (non-deliverable local mail will not be filtered but is left untouched, instead, thereby leaving sendmail to make the actual determination whether the mail can be delivered or not). The name supplied by this parameter overrides any name supplied by the "AliasList" option in the config file.

Usually, one would point this option at the standard aliases file (e.g. "/etc/aliases") employed by sendmail. The filter is capable of reading and processing this file, except for two small differences (which should have no real effect on the utility of the file): it does not support includes; lines must be continued by a '\' in the last position on any continued line. The filter also assumes that any name in this file is not bogus (i.e. that mail will actually be deliverable to any user named therein).

It is important, in order for filtering to work properly, that the filter know when a local user really exists and when they do not. Sendmail does some of the work, before it calls the filter, by determining whether the user is local but it does not determine if mail can actually be delivered to the user (until later on, that is). Thus, unles the filter decides this for itself, it could do work unnecessarily or, worse, create corral files for nonexistant users.

The filter looks up all local users in the password file to see if they are valid. If so, it assumes that mail can be delivered to them. However, this is insufficient. Aliases will not be found in the password file, yet delivery to them is valid. Hence the need for the filter to know what alias names are used.

-C Specify the name of the global options file to be used at startup. The default names, if not supplied, are: "/etc/mail/sendmailfilter.cf;/etc/sendmailfilter.cf".
-c Gives the name of the local options DBM database to be used to resolve lookups for individual user options. The options for all users are stored in this one file, under username keys.

The database lookup feature is not used, if this option isn't specified. If it is, the local filter and SpamAssassin config files are ignored and only the DBM database is used instead. This option overrides the "ConfigDB" config file option.

The database name given should not include the ".dir" and ".pag" extensions, since these are added gratuitously by DBM. The database itself can be built by the "ConfigEdit.cgi" program (see the chapter on "User Support") or directly by a program of your choice.

Note that the name used to look up the user options depends on the setting of the "-q" or "QualifyNames" flag.

-D This option supplies the list of domain names that are used to determine whether mail is being delivered locally or not. Any domain names in this list are considered local, for purposes of the "-e" or FilterExt and "-i" or FilterInt options. This option overrides the "DomainList" config file option.

The list may include one or more domain names, separated by commas. It may also include file names. Any name that begins with a '/', '~' or '.' is assumed to be a file name.

If a file name is given, the file should contain a list of local domain names, one per line. Blank lines and comments beginning with '#' are ignored. Typically, this feature would be used to point to the file "/etc/mail/local-host-names" or the same file where sendmail gets local domain name information from.

The domain name "localhost" is forced onto the list at the end, if it isn't specified. This name is always assumed to be a local domain name by sendmail.

Note that the filter attempts to automatically determine whether a recipeint's name is local or not, regardless of the contents of this list so using it is partly supplemental. While there is no harm in giving the filter a list of all local domain names (e.g. "/etc/mail/local-host-names"), it is not strictly necessary. However, if you have domains that are not local to this system but are still considered internal to your network, you might wish to include them in this list so that messages delivered to addresses in these domains will not get filtered, when sent from local users. Be aware, however, that some spammers spoof the same domain name for the sender as the recipient, thereby bypassing spam checking entirely, if the domain list includes the domains of local recipients.

It is much better, for the purposes of bypassing filtering from internal users, to include numeric IP addresses or address ranges within the domain list. If the domain list includes numeric IP addresses, the sender's IP address only (recipient IP addresses are not checked) will be compared against all of the IP addresses in the domain list. Any that match will be considered local senders.

Numeric IP addresses can be of the form "n.n.n.n" or "n.n.n.n/m". In the first case, the IP address of the sender must match the address given exactly. In the second case, the value "m" is a mask (range 0-32) that indicates how many high-order bits in the address are checked. For example, "192.168.1.0/24" will match all addresses in the range "192.168.1.0" to "192.168.1.255" while "192.168.0.0/16" will match all addresses in the range "192.168.0.0" to "192.168.255.255". The mask "/0" is equivalent to the mask "/32".

-Da Add to the list of users that are included in the local domain (i.e. internal users) any sender that validates to SMTP via an AUTH protocol (whichever one it is). This option may be useful to ISPs who have many external users who need to be treated as virtual internal users when they connect to SMTP. This command line option has the same effect as supplying the value "Auth" to the "DomainIncl" global configuration parameter.
-Dl Treat any sender that connects to SMTP via the local IP address (127.0.0.1) as if they were included in the local domain (i.e. an internal user). This command line option has the same effect as supplying the value "Local" to the "DomainIncl" global configuration parameter.
-d Turn on debugging (regardless of whether a debug file name was compiled into the filter). The name of the file where the debugging information is written must also be supplied as the value of this parameter. This will override any compiled-in file name as well as any value set by the "DebugFile" config file option.

Note that, if you expect error messages from options processing to be written to this debug file, this option must appear first on the command line. If this option isn't first, errors from option processing are always written to the sendmail syslog file (wherever that is) anyway.

-d0 thru -d4 Set the debug level to 0 thru 4, overriding any value specified by the "DebugLev" configuration file option. Increasing the debug level turns on tracing of progressively more and more detailed debugging information. Zero turns it off. One traces high level work. Two traces basic work. Three traces low level work. Four traces message transport. If you'd like your debug file to get filled up fast, pick level 3 or 4. The default is level 2.
-e Turn on filtering of messages sent externally (i.e. messages from someone inside the domain to someone outside the domain). Any setting by the "FilterExt" option in the config file is overridden.
-h Turn on HTML filtering for all HTML tags, regardless of where they occur in a message. Only really harmless tags are left intact (e.g. <p>, <br>). Only the setting of the "ReplaceHTML" option in the global config file is overridden by this flag.

If this flag is on, the filter removes all HTML that could be nasty. This means looking at tag names plus individual parameters within the tags. The tables found in smfopts.h are used to do these checks. If this flag is off, only the tags that are really nasty (plus embedded comments in tag names) are removed. The really nasty tags are: iframe, object, script, applet and embed, since they can launch code. These tags (and anything in between them and their terminators) are always removed, if you ask for virus checking.

-i Turn on filtering of messages sent internally (i.e. messages from someone inside the domain to someone else also inside the domain). Any setting by the "FilterInt" option in the config file is overridden.
-k Keep all messages filtered, regardless of whether they are altered or not, in the corral. This option can be useful for debugging or for anyone needing a complete audit trail of all messages processed. If the "KeepAll" option is set in the config file, it is overridden.

Please bear in mind, when using this option, that only messages that are filtered can be kept. If a message is not filtered because it is bypassed (e.g. internal -> external), it will not be kept. Sorry, that's just the way it has to be. However, if you use the either the "ArchiveDirectory" or "ArchiveProgram" configuration file option to archive messages, they do not suffer from this restriction.

-L Set the directory used for language support to the name given. This parameter is used in conjunction with the "Language" local configuration option (see that option for details) as well as the "X-Accept-Language" header in received messages. It overrides any value set in the global configuration file by the "LanguageDirectory" option.

If a message is received that includes an "X-Accept-Language" header, the language value supplied in that header is used to set the name of the language configuration file used by the filter to generate message inserts. The language value found is appended to the language directory supplied by this option and then the suffix ".cf" is appended.

Typically, the language values used in "X-Accept-Language" headers are two character names, as specified by the mail/MIME RFCs. Note that, prior to composing the file name, the the language value is lowercased to insure consistency. The language value in the header is used exactly to compose the file name. For example, if this option is set to "/etc/mail" and the "X-Accept-Language" header contains "fr", the following file will be processed:

/etc/mail/fr.cf

The file is opened and processed just like a regular local configuration file, except that the only parameters which are valid are those starting with "Msg" (from the Message Formatting Options section of the Configuration chapter), that is to say all of the message insert texts. The idea is to allow a separate configuration file to be created for each language that is supported and for this file to allow the text of all of the message inserts in it to be written in that language. When a message is received in a particular language, the text of all of the message inserts used is loaded from that language's configuration file.

-os Sum all of the local user options from the configurations of all of the recipients of a message to arrive at one, single set of options to be used for processing the message. If supplied, this parameter overrides "SumOptions" in the config file.

Normally, options from the configuration of the first or only local recipient of a message are used, in conjunction with the global options, to process each message (messages are only processed once, regardless of how many local recipients a message is bound for and non-local recipients are irrelevant, since some other system will deliver the message to them). If this option is chosen, the configurations of all of the recipients of a message are read and then all of their options are summed up to create a compound set of options that represent all of their preferences. This compound set of options is then used to process the message (it is still only processed once).

Option summation takes place as follows: for switches ("AutoRemail", "ReplaceHTML", "SpamAlways" and "SpamFastPath"), the "No" setting overrides the "Yes" setting; for the "UnknownDispo" option, the "MIME" setting overrides the "Ignore" setting, while the "Warn" setting overrides both; for the "SpamDel" option, the "trash" mode is overridden by the "deliver" mode which is overridden by the "corral" mode; for the "SpamLevel" option, the highest level is chosen; for the "StatXxxxRatio" options, the highest threshold found is used and for the "StatXxxxBoost" options, the lowest boost value found is used; for the "AddXTags" option, the "Yes" setting overrides the "No" setting for each tag ("envelope", "version") that is set in any configuration processed; and only the first recipient's settings are used for the "Language", "Msg*" and "SpamProto" options.

-p The communications protocol to use in talking to sendmail. This must match the value given in the sendmail configuration file via the mail filter option. For example, if sendmail's m4 config file says:

INPUT_MAIL_FILTER(`filter1',`S=inet:2526@localhost, F=R')dnl

the value passed via "-p" should be:

-p inet:2526@localhost

According to the sendmail documentation, the possibilities are:

{unix|local}:/path/to/file      A named pipe.
inet:port@{hostname|ip-address}      An IPV4 socket.
inet6:port@{hostname|ip-address}      An IPV6 socket.

One way or another, this option must be specified in order for the filter to operate. It must either be supplied via this command line switch or in the global config file via "SendmailProto". This parameter cannot be empty. The command line switch overrides the config file option.

-p1 thru -p24 Select the interval to partition the corral into. The number given is the number of hours to use for the partition interval. Acceptable values are 1, 2, 3, 4, 6, 8, 12 and 24. A value must be specified, there is no default. This value overrides any global config file value set by the "Partition" option.

Partitioning allows the number of filtered messages, stored in the corral, to be dramatically increased. Normally, an unpartitioned corral will hold all of the messages filtered on a small to medium sized system. However, on some large systems (e.g. those used in production environments and by ISPs), the number of messages that must be stored in the corral can easily exceed reasonable limits for the file system and preclude notification and message handler programs from being able to examine and process corralled messages.

Partitioning circumvents this problem by splitting the corral into separate subdirectories, based on the time interval specified. If a value of 24 is chosen, for example, the corral will be split into chunks that each contain 24 hours worth of corralled messages. These smaller subdirectories can be more easily processed by notification and message handler programs.

All filtered messages are partitioned with this option. Spam, may also be partitioned post facto (e.g. if the corral fills up and you need to partition immediately as a quick fix to a space problem) by the separate notification program specific to it. A reasonable partition interval is 24 hours. Lesser intervals may be chosen if you still experience problems with the volume of messages in the corral.

-q Use or do not use fully qualified names (i.e. recipient name plus domain name) for any configuration database lookup of user options that is done in conjunction with the "-q" parameter. This parameter overrides the "QualifyNames" configuration file option.

Users of this option typically have virtual users who do not have any real username or home directory on the machine where mail is delivered, hence the need to store their configuration information in a common database. Since it is possible for the same username to exist in two domains (e.g. "custserv"), the domain information must be used to qualify the username and ensure that the name used to store configuration information is unique.

Note that spam saved in the spam corral is automatically saved using only the recipient's name (domain information is stripped) for local users and using the fully qualified name for all others. This is done so that all spam to a single local recipient, regardless of how it was addressed to them, will be corralled under a single name. The advantage of this behavior is that it results in the need to send only a single notification message to each spam recipient when telling them about the spam they've received. However, bear in mind that even using this scheme, alias names will have their spam stored as a separate user, regardless of who the alias resolves to.

-r If spam filtering is done, the number of replies that should be sent to a spammer (per day), telling them that what they are sending is spam. Once the threshold is reached for a particular spammer in a single day, all subsequent messages from them are just dumped with no reply. The default is 0 (i.e. always just dump all spam). The maximum is 25. Any value set by "SpamReplies" in the configuration file is overridden.

This option is necessary because some spammers use autoresponders to reply to any mail sent to them. This may well appear as spam too, in which case, a reply will be sent to it. Do you see the potential for ping pong? I do.

-rm Allow automatic remailing of filtered messages. Without this option, whenever a message is altered, a copy of it is kept and the user is supplied with the name of the file where it is stored so that it can be used to fetch the message. They are also given the name of a Tech Support person whom they must contact to retrieve the message. This is the safest way of dealing with viruses, since the Tech Support person can quiz the user to see that they know what they're doing before releasing a possibly infected file.

With this option, the user is given the user name of a mail handling robot that can release the message to them. Upon receipt of a message from the user, with the subject supplied, the mail handling robot will release the original content of the message to the system for remailing to the recipient. Note that this method of handling infected messages, while completely automatic, is dangerous, since the unsuspecting user can have possibly infected messages delivered directly to them without filtering.

The global config file option "AutoRemail" only may be overridden by this switch.

-s The communications protocol and port to use in talking to a spam arbitration daemon. The protocol must be one that this filter knows about and the port must match the one that the daemon will be listening on.

Currently, the supported protocols are:

internal - The internal, fast path spam checker.
spamd - The Spam Assassin daemon.

Except for the "internal" protocol, the port number and host name or IP address must follow the protocol name, separated by a colon and an at sign. So, the entire parameter should look like one of the following:

internal

or

proto:port@host

For example, if you are running spamd and it is listening on port 2527 on the local machine, the value passed via "-s" should be:

-s spamd:2527@localhost

By supplying a valid protocol and port, you will cause all messages received to be sent to the spam arbitration daemon. If it thinks the message is spam, it will be treated in as such. If you don't supply this parameter, only the internal spam checks will be performed (blacklists/whitelists and statistical).

Note that when any spam arbitron is used, the internal, fast path spam checker is always run (unless the "-ss" option is set or "SpamFastPath" is set to "No"), prior to calling the arbitron selected, in an attempt to speed up spam checking by bypassing the overhead involved with calling the arbitron.

An optional timeout value may be supplied for any of the arbitrons chosen. If this parameter is supplied, it must follow the protocol, port number and host name immediately and be enclosed in parenthesis. For example:

-s spamd:2527@localhost(20)

The value given is the time, in seconds that MailCorral is prepared to wait for the entire transaction with the spam arbitron. Essentially, this is the total time for the arbitron to respond to the spam determination request -- it includes the elapsed time for all of the reads and writes to the arbitron. In the above example, the time allotted to SpamAssassin would be twenty seconds.

The default timeout is set to 30 seconds, if no value is supplied. You may pick any positive value but consider this. Sendmail is only prepared to wait for an answer from the filter for so long. If the arbitron timeout is to be of any use, it should be set to some value smaller than the timeout given to sendmail in its configuration file (no value in the sendmail configuration file means a default timeout of 10 seconds).

A good choice would be to give the arbitron 60-80% of the timeout allotted to the filter in the sendmail configuration file. This will ensure that the filter can still complete the job of filtering a message in the time allowed, despite the fact that the arbitron times out while making its decision. And, if a virus arbitron is also being used (see the "-v" option or "VirusProto"), the time allocated to both the arbitrons should be split between them, as you see fit. In that case a good choice would be to give 30-40% to each of the two arbitrons.

Note that a time limit may be supplied to the internal arbitron but there isn't much reason for this, since it executes extremely quickly. When calculating how much time to allow the filter and its arbitrons, you can neglect the time consumed by the internal arbitron.

The global config file option "SpamProto" can be overridden by this command line parameter.

-s1 thru -s3 Set the level of spam reporting to one thru three, overriding any global config file value set by the "SpamLevel" option. The default is "-s1".

Level one causes a single header line to be inserted in any message that is found to contain spam, giving the spam processing statistics as three percentage values.

Level two causes the same header line described under "-s1" to be inserted in any message that is found to contain spam. It also inserts any headers generated by the spam arbitron (selected by "-s") into the message.

Level three does everything that levels one and two do, plus it formats any report generated by the spam arbitron for insertion into the message body as a paragraph of text.

-sa Always add the spam report to mail, regardless of whether it is spam or not. The global option "SpamAlways" may be overridden by this switch.
-sc Set the spam delivery mode to "corral", the default. This mode will cause any spam received to be sidelined in the spam corral, where it can be processed by a spam notification program and/or released for delivery by the recipient at a later date. The choice made by the global option "SpamDel" may be overridden by this switch.
-sd Set the spam delivery mode to "deliver". This mode will cause any spam received to be marked as such and then delivered to the recipient without further delay. The default is "-sc". Any value set by the global option "SpamDel" is overridden by this switch.
-ss Turn off the spam fast path check, prior to calling the designated spam arbitron. The statistical spam fast check is turned off by this flag, too. If the designated arbitron is "internal" this flag has no effect, since it is possible to turn off the use of the internal check as the designated arbitron by not specifying it at all. This switch will override the "SpamFastPath" config file option.
-st Set the spam delivery mode to "trash". This mode will cause any spam received to be tossed directly in the trash can, straight away. I like it! Unfortunately, the default is "-sc". The choice made by the global option "SpamDel" may be overridden by this switch.
-t Turn on filtering of messages transiting the system (i.e. messages from someone outside the domain to someone else outside the domain). The transit setting is typically used by ISPs who are delivering mail for many other systems on a mail handling gateway. Any setting by the "FilterTrans" option in the config file is overridden.
-u A proxy userid to use in looking up mail disposition and spam processing options. Normally, the recipient's userid, stripped of domain information, is used to look up user-specific mail disposition options plus spam processing options in their home directory. However, if the recipient doesn't have a home directory and if a proxy userid is given, the options in the proxy userid's home directory will be used instead.

This option might be useful to ISPs, who process mail for many users but who do not have userids set up for all of them. Usually, mail delivery is governed by user-specific options found in a ".sendmailfilter" file in each recipient's home directory. However, if the recipient has no home directory, the ".sendmailfilter" file in the proxy userid home directory is used.

A similar situation arises with Spam Assassin, if spamd is used to arbitrate whether a message is spam. Spam Assassin looks in the recipient's home directory for options such as whitelists. If the user doesn't have a home directory, the home directory of the proxy userid is used instead.

This option overrides any value set by the "ProxyUser" config file option.

Note that the values set for the "IgnoreSpam" and "IgnoreVirus" options in a proxy user's configuration have no effect on users who have no configuration. Such users will have their viruses and spam filtered by default and there is no way to avoid this, short of setting up configuration information for them. It was felt that virus and spam filtration was too important to get bypassed by accident.

-ui Select a disposition of "Ignore, for unknown file types found as attachments to a message. Selecting a disposition of "Ignore" will cause attachments with unknown file types to be ignored and treated as if they were perfectly OK. No warning will be issued for any unknown file types. This setting is more than somewhat dangerous, since it is quite possible that a file type which the filter doesn't know about may be harmful. Not receiving any warning could allow a message recipient to inadvertently open a harmful attachment. This option overrides any disposition choice made by the "UnknownDispo" option in the global configuration file. The "-um" and "-uw" flags also affect the disposition choice.
-um Set the disposition for unknown file types, found as attachments to a message, to "MIME". The "MIME" disposition will cause the filter to take additional steps to determine the file type, before issuing a warning. It will look at the MIME type for the attachment and, if it is one of the acceptable types, no warning will be issued. Otherwise, a warning will be issued, as below, for the "-uw" disposition. This option overrides any disposition choice made by the "UnknownDispo" option in the global configuration file. The "-ui" and "-uw" flags also affect the disposition choice.
-uw Set the disposition for unknown file types, found as attachments to a message, to "Warn". A disposition of "Warn" will cause a warning to be inserted into the message whenever an attachment is found with an unknown file type (i.e. it has no extension or an extension that is not known). This is the default setting, since it is quite possible that a file type which the filter doesn't know about may be harmful and the assumption is made that it is best to warn about these things. This option overrides any disposition choice made by the "UnknownDispo" option in the global configuration file. The "-ui" and "-um" flags also affect the disposition choice.
-v The communications protocol and port to use in talking to a virus arbitration daemon. The protocol must be one that this filter knows about and the port must match the one that the daemon will be listening on.

Currently, the supported protocols are:

clamd - The ClamAV daemon.
internal - The internal virus checker.

Except for the "internal" protocol, the port number and host name or IP address must follow the protocol name, separated by a colon and an at sign. So, the entire parameter should look like one of the following:

internal

or

proto:port@host

For example, if you are running clamd and it is listening on port 2528 on the local machine, the value passed via "-v" should be:

-v clamd:2528@localhost

By supplying a valid protocol and port, you will cause all messages received to be sent to the virus arbitration daemon. If it thinks the message contains a virus, it will be treated in an appropriate manner. If you don't supply this parameter, only the internal virus checks will be performed (MIME type/attachment type 0 matching).

Note that when any virus arbitron is used, the internal virus checker is always run (unless the "-vv" option is set or "VirusChecker" is set to "No"), prior to calling the arbitron selected, but its results will not necessarily cause the virus arbitron to be bypassed. Only in the case of MIME entities or attachments that the internal virus checker marks for deletion will the external arbitron be bypassed, and such instances are admittedly quite rare.

An optional timeout value may be supplied for any of the arbitrons chosen. If this parameter is supplied, it must follow the protocol, port number and host name immediately and be enclosed in parenthesis. For example:

-v clamd:2528@localhost(20)

The value given is the time, in seconds that MailCorral is prepared to wait for the entire transaction with the virus arbitron. Essentially, this is the total time for the arbitron to respond to the virus determination request -- it includes the elapsed time for all of the reads and writes to the arbitron. In the above example, the time allotted to ClamAV would be twenty seconds.

The default timeout is set to 30 seconds, if no value is supplied. You may pick any positive value but consider this. Sendmail is only prepared to wait for an answer from the filter for so long. If the arbitron timeout is to be of any use, it should be set to some value smaller than the timeout given to sendmail in its configuration file (no value in the sendmail configuration file means a default timeout of 10 seconds).

A good choice would be to give the arbitron 60-80% of the timeout allotted to the filter in the sendmail configuration file. This will ensure that the filter can still complete the job of filtering a message in the time allowed, despite the fact that the arbitron times out while making its decision. And, if a spam arbitron is also being used (see the "-s" option or "SpamProto"), the time allocated to both the arbitrons should be split between them, as you see fit. In that case a good choice would be to give 30-40% to each of the two arbitrons.

Note that a time limit may be supplied to the internal arbitron but there isn't much reason for this, since it executes extremely quickly. When calculating how much time to allow the filter and its arbitrons, you can neglect the time consumed by the internal arbitron.

The global config file option "VirusProto" can be overridden by this command line parameter.

-va Always add the virus report to mail, regardless of whether it contains any viruses or not. The global option "VirusAlways" may be overridden by this switch.
-vv Turn off the internal virus checker, prior to calling the designated virus arbitron. If the designated arbitron is "internal" this flag has no effect, since it is possible to turn off the use of the internal checker as the designated arbitron by not specifying it at all. This switch will override the "VirusChecker" config file option.
-X Indicate which additional headers should be added to any messages delivered. The string that follows tells which headers should be added. Multiple header names may be supplied in a comma separated list. The choices are:

env[elope] Add headers giving the envelope information, such as from and to address.
ver[sion] Add a version header for MailCorral.

For example: "-Xenv,ver".

This option overrides any value set by the global "AddXTags" config file option. However, the local config file may be used to turn off the value set by this parameter, on an individual user basis.

2.9 Performance Expectations

No filtering of mail messages can be expected to come for free, especially not the extensive level of filtering that is done by MailCorral. However, the filter was built with performance in mind and it has been shown to perform very well in production environments.

A typical user of MailCorral is an ISP, with thousands of users, who runs sendmail on a dedicated (or mostly dedicated) mail server. In real environments, we have observed sustained loads of 2-4 email messages per second (which translates approximately to 10,000 messages per hour or a quarter million messages per day). The performance numbers stated below are typical for server loads such as this and you can expect similar numbers on your server.

Adding filtering to the email server increases the load on the system such that the filter (plus the spam and virus arbitrons) consumes between 25-45% of the CPU. The filter itself usually accounts for 15-20% of the CPU and the typical spam arbitron (e.g. SpamAssassin) usually accounts for 5-10% of the CPU. The virus arbitron can account for another 10-15% of the CPU.

To put it another way, if your CPU is more than 55-60% busy running sendmail, you may experience some performance problems by adding filtering. If it is less than 50% busy, you should see no impact. When considering an upgrade to a machine running flat out, a 65% increase in horsepower should easily accomodate filtering.

The guidelines above should help you in planning your mail filtering setup. If you would like more specific performance numbers, please contact BSM Development and tell us what you'd like to see.