Mailcorral Documentation

3. Configuration

The operation of the sendmail filter can be altered by configuration parameters in two configuration files, one a global configuration file that applies to all of the messages processed and the other a local configuration file that only applies to the messages for an individual recipient (typically, there is one such file for each recipient, in their home directory). Usually the global configuration file is "/etc/mail/sendmailfilter.cf" or "/etc/sendmailfilter.cf" and the local configuration file is ".sendmailfilter", in the recipient's home directory. However, these file names may be changed when the filter is built, at compile time, by editing GLOBALOPTIONS and LOCALOPTIONS in "smfopts.h". See the Installation Section for more information.

The format of the configuration files is the same as that employed by many Unix-style configuration files. Comments are allowed and are indicated by '#'. They may appear on a line by themselves or at the end of any line with a parameter on it. Once the '#' is seen, everything that follows it is ignored until the end of the line. Blank lines are permissible and are ignored. Leading and trailing whitespace is ignored. Configuration parameters must appear one to a line. The parameter name must be separated from its value by at least one whitespace. The presence of a switch type parameter name is sufficient to set its default setting. Multiple occurrences of a parameter are permissible but multiple values must be specified by repeating the parameter name. Multiple values may not appear on a single line.

Parameter values may be continued on multiple lines by ending each line, except for the last, with '\'. The continuation may appear anyplace in the parameter value. Whitespace preceding the '\' and beginning the continued line is thrown away.

The message string parameters accept quoted strings that are identical to C format. Each string must begin and end with a quote. If a second quote appears after the first, accumulation of the parameter is ended until another quote appears. The standard escape sequences "\"", "\\", "\r", "\n" and "\t" are recognized. The sample in the Sample Configuration File Section shows all this.

The configuration files can be edited at any time. The sendmail filter must be restarted for new global options to take effect. Changes to the local configuration file take effect with the receipt of the next message destined for the user in question.

3.1 Global Configuration

Global configuration information is kept in one of the two files "/etc/mail/sendmailfilter.cf" or "/etc/sendmailfilter.cf" (unless changes were made to GLOBALOPTIONS in "smfopts.h" prior to compiling the filter). The first file found, in the order shown, is read and processed.

All of the Filtering Options and Spam Processing Options are available in this file. With one exception (the "-C" option, for obvious reasons), all of the command line parameters have parallel options in the global configuration file. Global config options are overridden by any options supplied on the command line.

3.2 Local (User Specific) Configuration

Local configuration information is generally kept in ".sendmailfilter", in the recipient's home directory (unless a change was made to LOCALOPTIONS in "smfopts.h" prior to compiling the filter). If the recipient doesn't have a password file entry (found in "/etc/passwd") or home directory or there are no permissions on this directory, the proxy username may be used instead, to look up options that apply to this user. The proxy username works just like a regular user name, in that the options file for it will be processed, if found and applied to the recipient of a message.

Only some of the Filtering Options (the ones marked with an asterisk) and all of the Spam Processing Options are available in this file. All of the command line parameters that are applicable to individual message processing have parallel options in the local configuration file. Local config options override all other options, both from the global config file and any options supplied on the command line.

3.3 Filtering Options

The message filtering options apply to all messages that are seen by the filter. They control every aspect of filtering from how the filter talks to sendmail to what types of messages should be filtered to how the message are disposed.

Either of the configuration files can contain the message disposition options. All of the other options are only available in the global options file. Here is the list of message filtering options available in the configuration files:

AddXTags * none|env|ver   Indicate which additional headers should be added to any messages delivered. The string that follows tells which headers should be added. Multiple header names may be supplied in a comma separated list. The choices are:

env[elope] Add headers giving the envelope information, such as from and to address.
ver[sion] Add a version header for MailCorral.

For example: "AddXTags env,ver".

An empty parameter value may be specified in the local (user) options file to turn off any XTags that were turned on globally. In the global options file, this parameter cannot be empty. The global option may be overridden by the "-X" command line option.

AliasList Supplies the name of a file or list of files, separated by semi-colons, that contain the names of all of the local aliases, for the purposes of allowing the filter to determine whether local mail is actually deliverable or not (non-deliverable local mail will not be filtered but is left untouched, instead, thereby leaving sendmail to make the actual determination whether the mail can be delivered or not). The name supplied by this option is overridden by the "-A" command line parameter.

Usually, one would point this option at the standard aliases file (e.g. "/etc/aliases") employed by sendmail. The filter is capable of reading and processing this file, except for two small differences (which should have no real effect on the utility of the file): it does not support includes; lines must be continued by a '\' in the last position on any continued line. The filter also assumes that any name in this file is not bogus (i.e. that mail will actually be deliverable to any user named therein).

It is important, in order for filtering to work properly, that the filter know when a local user really exists and when they do not. Sendmail does some of the work, before it calls the filter, by determining whether the user is local but it does not determine if mail can actually be delivered to the user (until later on, that is). Thus, unles the filter decides this for itself, it could do work unnecessarily or, worse, create corral files for nonexistant users.

The filter looks up all local users in the password file to see if they are valid. If so, it assumes that mail can be delivered to them. However, this is insufficient. Aliases will not be found in the password file, yet delivery to them is valid. Hence the need for the filter to know what alias names are used.

AutoRemail[ing] * Yes|No   Allow or disallow automatic remailing of filtered messages. Without this option, whenever a message is altered, a copy of it is kept and the user is supplied with the name of the file where it is stored so that it can be used to fetch the message. They are also given the name of a Tech Support person whom they must contact to retrieve the message. This is the safest way of dealing with viruses, since the Tech Support person can quiz the user to see that they know what they're doing before releasing a possibly infected file.

With this option, the user is given the user name of a mail handling robot that can release the message to them. Upon receipt of a message from the user, with the subject supplied, the mail handling robot will release the original content of the message to the system for remailing to the recipient. Note that this method of handling infected messages, while completely automatic, is dangerous, since the unsuspecting user can have possibly infected messages delivered directly to them without filtering.

The global option only may be overridden by the "-rm" command line switch.

ConfigDB Gives the name of the local options DBM database to be used to resolve lookups for individual user options. The options for all users are stored in this one file, under username keys.

The database lookup feature is not used, if this option isn't specified. If it is, the local filter and SpamAssassin config files are ignored and only the DBM database is used instead. This option may be overridden by the "-c" command line option.

The database name given should not include the ".dir" and ".pag" extensions, since these are added gratuitously by DBM. The database itself can be built by the "ConfigEdit.cgi" program (see the chapter on "User Support") or directly by a program of your choice.

Note that the name used to look up the user options depends on the setting of the "-q" or "QualifyNames" flag.

DebugFile Turn on debugging (regardless of whether a debug file name was compiled into the filter) and set the name of the file where the debugging information is written to the name supplied as the value of this parameter. This will override any compiled-in file name but, will in turn be overridden by the "-d" command line option.

Note that, if you expect error messages from options processing to be written a debug file, a debug file name must be complied into the filter or the "-d" command line option must be used and it must appear first on the command line. If that isn't done, errors from option processing are always written to the sendmail syslog file (wherever that is) anyway.

DebugLev[el] 2|0|1|3|4   Set the debug level to 0 thru 4. Increasing the debug level turns on tracing of progressively more and more detailed debugging information. Zero turns it off. One traces high level work. Two traces basic work. Three traces low level work. Four traces message transport. If you'd like your debug file to get filled up fast, pick level 3 or 4. The default is level 2. This option may be overridden by the "-d0" thru "-d4" command line options.
DomainIncl[udes] Auth|Local   A comma separated list (no intervening whitespace is allowed) of one or more additional classes of users that are included in the local domain. All of the items in the list are orred together to give a cumulative list of additional user classes. At present, there are two acceptable values, those being "Auth" and "Local". The "-Da" command line option has the same effect as supplying the value "Auth" to this parameter, while the "-Dl" option is the same as "Local".

If "Auth" is chosen, any sender that validates to SMTP via an AUTH protocol (whichever one it is), will be treated as an internal user. This option may be useful to ISPs who have many external users who need to be treated as virtual internal users when they connect to SMTP.

If "Local" is chosen, any sender that connects to SMTP via the local IP address (127.0.0.1), will be treated as an internal user.

DomainList This option supplies the list of domain names that are used to determine whether mail is being delivered locally or not. Any domain names in this list are considered local, for purposes of the "-e" or FilterExt and "-i" or FilterInt options. This option may be overridden by the "-D" command line option.

The list may include one or more domain names, separated by commas. It may also include file names. Any name that begins with a '/', '~' or '.' is assumed to be a file name.

If a file name is given, the file should contain a list of local domain names, one per line. Blank lines and comments beginning with '#' are ignored. Typically, this feature would be used to point to the file "/etc/mail/local-host-names" or the same file where sendmail gets local domain name information from.

The domain name "localhost" is forced onto the list at the end, if it isn't specified. This name is always assumed to be a local domain name by sendmail.

Note that the filter attempts to automatically determine whether a recipeint's name is local or not, regardless of the contents of this list so using it is partly supplemental. While there is no harm in giving the filter a list of all local domain names (e.g. "/etc/mail/local-host-names"), it is not strictly necessary. However, if you have domains that are not local to this system but are still considered internal to your network, you might wish to include them in this list so that messages delivered to addresses in these domains will not get filtered, when sent from local users. Be aware, however, that some spammers spoof the same domain name for the sender as the recipient, thereby bypassing spam checking entirely, if the domain list includes the domains of local recipients.

It is much better, for the purposes of bypassing filtering from internal users, to include numeric IP addresses or address ranges within the domain list. If the domain list includes numeric IP addresses, the sender's IP address only (recipient IP addresses are not checked) will be compared against all of the IP addresses in the domain list. Any that match will be considered local senders.

Numeric IP addresses can be of the form "n.n.n.n" or "n.n.n.n/m". In the first case, the IP address of the sender must match the address given exactly. In the second case, the value "m" is a mask (range 0-32) that indicates how many high-order bits in the address are checked. For example, "192.168.1.0/24" will match all addresses in the range "192.168.1.0" to "192.168.1.255" while "192.168.0.0/16" will match all addresses in the range "192.168.0.0" to "192.168.255.255". The mask "/0" is equivalent to the mask "/32".

FilterExt[ernal] Yes|No   Turn on or off filtering of messages sent externally (i.e. messages from someone inside the domain to someone outside the domain). This option may be overridden by the "-e" command line option.
FilterInt[ernal] Yes|No   Turn on or off filtering of messages sent internally (i.e. messages from someone inside the domain to someone else also inside the domain). This option may be overridden by the "-i" command line option.
FilterTrans[it] Yes|No   Turn on or off filtering of messages transiting the system (i.e. messages from someone outside the domain to someone else outside the domain). The transit setting is typically used by ISPs who are delivering mail for many other systems on a mail handling gateway. This option may be overridden by the "-t" command line option.
IgnoreVirus[es] Yes|No   Turn on or off the filtering of any viruses found in messages for this user. If this flag is set, viruses will be ignored even if they are found. You asked for enough rope to hang yourself. Here it is! Be very careful about using this option in the global configuration file as it will turn off all virus filtering. Also, be aware that the "SumOptions" or "-os" flag has no effect on this flag. Rather, the setting of this flag, whether on or off, is applied to each user individually.

Note that, for efficiency sake, there is only one copy of any modified email message. This being the case, it is possible that a message containing a warning about a virus, which is sent as well to multiple users who wish to be warned about both viruses and spam, whould be sent to a recipient who has "IgnoreVirus" set to "Yes" and "IgnoreSpam" set to "No". This recipient will see the virus notification, despite the fact that they've requested not to. In this case, the wishes of the many outweigh the wishes of the few.

KeepAll Yes|No   When turned on (set to "Yes"), this option will keep all messages filtered, regardless of whether they are altered or not, in the corral. This option can be useful for debugging or for anyone needing a complete audit trail of all messages processed. The setting of this option may be overridden by the "-k" command line option.

Please bear in mind, when using this option, that only messages that are filtered can be kept. If a message is not filtered because it is bypassed (e.g. internal -> external), it will not be kept. Sorry, that's just the way it has to be. However, if you use the either the ArchiveDirectory or ArchiveProgram option to archive messages, they do not suffer from this restriction.

Language * The name of the language configuration file used by the filter to generate message inserts is set by this option. The name given is appended to the language directory supplied by the "-L" or "LanguageDirectory" options and then the suffix ".cf" is appended. Note that case is important, since the name given is used exactly to compose the file name. For example, if the "-L" option is set to "/etc/mail" and the user selects "Polish" via this parameter, the following file will be processed:

/etc/mail/Polish.cf

The file is opened and processed just like a regular local configuration file, except that the only parameters which are valid are those starting with "Msg" (from the Message Formatting Options section, below), that is to say all of the message insert texts. The idea is to allow a separate configuration file to be created for each language that is supported and for this file to allow the text of all of the message inserts in it to be written in a particular language. When the user selects a language, in their local configuration, the text of all of the message inserts used for them is loaded from the language configuration file.

This option may only be specified in the local configuration file or the DBM configuration database.

LanguageDir[ectory] Set the directory used for language support to the name given. This parameter is used in conjunction with the "Language" local configuration option (see that option for details) as well as the "X-Accept-Language" header in received messages. It is overridden by the "-L" command line option.

If a message is received that includes an "X-Accept-Language" header, the language value supplied in that header is used to set the name of the language configuration file used by the filter to generate message inserts. The language value found is appended to the language directory supplied by this option and then the suffix ".cf" is appended.

Typically, the language values used in "X-Accept-Language" headers are two character names, as specified by the mail/MIME RFCs. Note that, prior to composing the file name, the the language value is lowercased to insure consistency. The language value in the header is used exactly to compose the file name. For example, if this parameter is set to "/etc/mail" and the "X-Accept-Language" header contains "fr", the following file will be processed:

/etc/mail/fr.cf

The file is opened and processed just like a regular local configuration file, except that the only parameters which are valid are those starting with "Msg" (from the Message Formatting Options section, below), that is to say all of the message insert texts. The idea is to allow a separate configuration file to be created for each language that is supported and for this file to allow the text of all of the message inserts in it to be written in that language. When a message is received in a particular language, the text of all of the message inserts used is loaded from that language's configuration file.

Paranoid Yes|No   Turn on or off filtering of messages from all senders. Normally, if this option isn't turned on, messages from trusted users such as root, mailer-daemon and other daemons, sent from the local machine are not filtered. The reason for this is because, filtering such messages can be annoying and there is often no need for them to be filtered. However, if you are running in an environment where nobody can be trusted (e.g. an ISP), you probably don't want to let any messages, even from normally-trusted users, pass without filtering.
Partition 24|1|2|3|4|6|8|12   Select the interval to partition the corral into. The number given is the number of hours to use for the partition interval. Acceptable values are those shown. If a value is not specified, the default is 24 hours. This value is overridden by the command line option "-pn", where "n" is the partition interval.

Partitioning allows the number of filtered messages, stored in the corral, to be dramatically increased. Normally, an unpartitioned corral will hold all of the messages filtered on a small to medium sized system. However, on some large systems (e.g. those used in production environments and by ISPs), the number of messages that must be stored in the corral can easily exceed reasonable limits for the file system and preclude notification and message handler programs from being able to examine and process corralled messages.

Partitioning circumvents this problem by splitting the corral into separate subdirectories, based on the time interval specified. If a value of 24 is chosen, for example, the corral will be split into chunks that each contain 24 hours worth of corralled messages. These smaller subdirectories can be more easily processed by notification and message handler programs.

All filtered messages are partitioned with this option. Spam, may also be partitioned post facto (e.g. if the corral fills up and you need to partition immediately as a quick fix to a space problem) by the separate notification program specific to it. A reasonable partition interval is 24 hours. Lesser intervals may be chosen if you still experience problems with the volume of messages in the corral.

ProxyUser[id] A proxy userid to use in looking up mail disposition and spam processing options. Normally, the recipient's userid, stripped of domain information, is used to look up user-specific mail disposition options plus spam processing options in their home directory. However, if the recipient doesn't have a home directory and if a proxy userid is given, the options in the proxy userid's home directory will be used instead.

This option might be useful to ISPs, who process mail for many users but who do not have userids set up for all of them. Usually, mail delivery is governed by user-specific options found in a ".sendmailfilter" file in each recipient's home directory. However, if the recipient has no home directory, the ".sendmailfilter" file in the proxy userid home directory is used.

A similar situation arises with Spam Assassin, if spamd is used to arbitrate whether a message is spam. Spam Assassin looks in the recipient's home directory for options such as whitelists. If the user doesn't have a home directory, the home directory of the proxy userid is used instead.

This option may be overridden by the "-u" command line option.

Note that the values set for the "IgnoreSpam" and "IgnoreVirus" options in a proxy user's configuration have no effect on users who have no configuration. Such users will have their viruses and spam filtered by default and there is no way to avoid this, short of setting up configuration information for them. It was felt that virus and spam filtration was too important to get bypassed by accident.

QualifyNames Yes|No   Use or do not use fully qualified names (i.e. recipient name plus domain name) for any configuration database lookup of user options that is done in conjunction with the "ConfigDB" option. This option may be overridden by the "-q" command line parameter.

Users of this option typically have virtual users who do not have any real username or home directory on the machine where mail is delivered, hence the need to store their configuration information in a common database. Since it is possible for the same username to exist in two domains (e.g. "custserv"), the domain information must be used to qualify the username and ensure that the name used to store configuration information is unique.

Note that spam saved in the spam corral is automatically saved using only the recipient's name (domain information is stripped) for local users and using the fully qualified name for all others. This is done so that all spam to a single local recipient, regardless of how it was addressed to them, will be corralled under a single name. The advantage of this behavior is that it results in the need to send only a single notification message to each spam recipient when telling them about the spam they've received. However, bear in mind that even using this scheme, alias names will have their spam stored as a separate user, regardless of who the alias resolves to.

ReplaceHTML * Yes|No   Turn on or off filtering for all HTML tags, regardless of where they occur in a message. Only really harmless tags are left intact (e.g. <p>, <br>). The global option may be overridden by the "-h" command line option.

If this flag is on, the filter removes all HTML that could be nasty. This means looking at tag names plus individual parameters within the tags. The tables found in smfopts.h are used to do these checks. If this flag is off, only the tags that are really nasty (plus embedded comments in tag names) are removed. The really nasty tags are: iframe, object, script, applet and embed, since they can launch code. These tags (and anything in between them and their terminators) are always removed, if you ask for virus checking.

SendmailProto[col] The communications protocol to use in talking to sendmail. This must match the value given in the sendmail configuration file via the mail filter option. For example, if sendmail's m4 config file says:

INPUT_MAIL_FILTER(`filter1',`S=inet:2526@localhost, F=R')dnl

the value of SendmailProto should be:

SendmailProto inet:2526@localhost

According to the sendmail documentation, the possibilities are:

{unix|local}:/path/to/file      A named pipe.
inet:port@{hostname|ip-address}      An IPV4 socket.
inet6:port@{hostname|ip-address}      An IPV6 socket.

One way or another, this option must be specified in order for the filter to operate. It must either be supplied in the global config file or via the "-p" command line switch. This parameter cannot be empty. The command line switch overrides the config file option.

SumOptions Yes|No   Turn on or off summation of local user options. When turned on, this causes all of the local user options from the configurations of all of the local recipients of a message to be summed to arrive at one, single set of options to be used for processing the message. If supplied, the "-os" command line parameter overrides this option.

Normally, options from the configuration of the first or only local recipient of a message are used, in conjunction with the global options, to process each message (messages are only processed once, regardless of how many local recipients a message is bound for and non-local recipients are irrelevant, since some other system will deliver the message to them). If this parameter is set to "Yes", the configurations of all of the recipients of a message are read and then all of their options are summed up to create a compound set of options that represent all of their preferences. This compound set of options is then used to process the message (it is still only processed once).

Option summation takes place as follows: for switches ("AutoRemail", "ReplaceHTML", "SpamAlways" and "SpamFastPath"), the "No" setting overrides the "Yes" setting; for the "UnknownDispo" option, the "MIME" setting overrides the "Ignore" setting, while the "Warn" setting overrides both; for the "SpamDel" option, the "trash" mode is overridden by the "deliver" mode which is overridden by the "corral" mode; for the "SpamLevel" option, the highest level is chosen; for the "StatXxxxRatio" options, the highest threshold found is used and for the "StatXxxxBoost" options, the lowest boost value found is used; for the "AddXTags" option, the "Yes" setting overrides the "No" setting for each tag ("envelope", "version") that is set in any configuration processed; and only the first recipient's settings are used for the "Language", "Msg*" and "SpamProto" options.

UnknownDispos[ition] *      W[arn]|M[IME]|I[gnore]   Chooses the disposition for unknown file types, found as attachments to a message. The global option only may be overridden by the "-uw", "-um" and "-ui" command line options.

A disposition of "Warn" will cause a warning to be inserted into the message whenever an attachment is found with an unknown file type (i.e. it has no extension or an extension that is not known). This is the default setting, since it is quite possible that a file type which the filter doesn't know about may be harmful and the assumption is made that it is best to warn about these things.

The "MIME" disposition will cause the filter to take additional steps to determine the file type, before issuing a warning. It will look at the MIME type for the attachment and, if it is one of the acceptable types, no warning will be issued. Otherwise, a warning will be issued, as above, for the "Warn" disposition.

A disposition of "Ignore" will cause attachments with unknown file types to be ignored and treated as if they were perfectly OK. No warning will be issued for any unknown file types. This setting is more than somewhat dangerous, since it is quite possible that a file type which the filter doesn't know about may be harmful. Not receiving any warning could allow a message recipient to inadvertently open a harmful attachment.

3.4 Archiver Support Options

The archiver support options are built into the filter to provide support for email archiving programs that are used to archive a copy of some or all messages that are passed through a system, usually in a database or file store. Messages may be directed to the archiver via the filter or they may be sent by an email system (such as Scalix) directly to the archiver. If the later option is used, the archiver support options can be used to deal with unwanted bouncebacks in the event that the archiver is not available.

Only the global options configuration file can contain the archiver support options. Here is the list of archiver support options available in the global configuration file:

ArchiveDir[ectory] Using this parameter in the global configuration file will cause the filter to store a duplicate copy of all messages that it processes, in the directory named in this parameter. Each message is given a name that is based on the sender's name, plus an additional uniqueness key that renders collisions highly improbable. The archive directory is partitioned automatically, once a day, to allow it to be more easily managed and prevent the archive directory from growing too big for the file system to manage. Should you wish to split the archive more often, see the "ArchivePartition" parameter for more information about how to do it.

Messages in the archive directory are stored exactly the way they are delivered to sendmail. In all probability, the lawyers will want to be pouring over each one with a fine tooth comb so it is important that each message appear exactly as sent. Thus, each archive file is a flat text file that contains the entire email message: headers; straight text; and MIME encoding. Furthermore all messages can be archived, including those that bypass normal filtering when checks such as the internal/external/transit checks controlled by the "FilterExternal", "FilterInternal" and "FilterTransit" options indicate that a message should not be filtered.

Note that this parameter is mutually exclusive with the "ArchiveProgram parameter". Whichever one comes last in the config file wins.

ArchivePartition 24|1|2|3|4|6|8|12   Select the interval to partition the archive directory into. The number given is the number of hours to use for the partition interval. Acceptable values are those shown. If a value is not specified, the default is 24 hours.

Partitioning allows the number of messages, stored in the archive directory, to be dramatically increased. Normally, an unpartitioned archive directory will hold all of the messages archived on a small to medium sized system. However, on some large systems (e.g. those used in production environments and by ISPs), the number of messages that must be stored in the archive directory can easily exceed reasonable limits for the file system and preclude additional messages from being archived.

Partitioning circumvents this problem by splitting the archive directory into separate subdirectories, based on the time interval specified. If a value of 24 is chosen, for example, the archive directory will be split into chunks that each contain 24 hours worth of archived messages. These smaller subdirectories can be more easily processed by any of the programs in the archiver suite.

All archived messages are partitioned with this option. A reasonable partition interval is 24 hours. Lesser intervals may be chosen if you still experience problems with the volume of messages in the archive directory. If you are looking for a particular archived message, the partitioned subdirectories are named based on the partition interval time. The subdirectory name is simply the time of the beginning of each interval, in seconds from the epoch, expressed as an 8-digit hexadecimal number.

ArchiveProg[ram] Using this parameter in the global configuration file will cause the filter to send a duplicate copy of all messages that it processes, via regular email, to the archive program whose address is given. Several MTAs (e.g. Scalix) employ this method to deliver messages for archiving to an archive program. Consequently, many of the email archive programs are capable of accepting messages to be archived as incoming mail. You will need to consult your archiver's documentation about how to set this up.

On the MailCorral side, one simply assigns to this parameter the email address of the archive program and, optionally, enumerates the recipient addresses of those whose mail is to be archived, using the "archive_to" parameter.

One word of caution, though. Recall that the duplicate copy of the message, bound for the archiver, is delivered via regular email channels. This means that it makes a second pass through the filter. In order that the duplicate message itself is not also duplicated and archived, and so as to avoid processing (and possibly altering) the duplicate message a second time, the filter looks for all messages bound for the archiver's email address and bypasses them.

There are two points to note, then. The first is that all messages sent to the archiver's email are bypassed, not just those that MailCorral duplicates and forwards to it. So, if someone externally sends a message to the archiver's email address, it goes right through, untouched by the filter. This could be construed as a feature because it allows outsiders to also send messages directly to the archiver for archiving. However, on the flip side of the coin, it could instead be construed as a less than ideal way of operating. But, it was felt that all messages bound for the archiver's email address were probably going to be sent, by the archive program, straight through to the archive repository anyway (i.e. it wouldn't be trying to actually read them) so there was no disadvantage to giving them all a free pass.

The second point is that the address that you supply to this parameter must match those that sendmail will ultimately make up as the duplicate message's delivery address. In general, if the sendmail configuration file has domain name masquerading turned on (the "DM" option in the sendmail.cf file or the "MASQUERADE_AS" option in the sendmail.mc file) and you are archiving to a local recipient, the address given can either be the full name (i.e. username, at sign, domain name) or it can simply be the username, all by itself. If you have some other, clever way of getting stuff to local users, don't use it here. Otherwise, for recipients on other systems, you must use their correct username, an at sign and their full domain name (there's a good chance your mail won't get delivered properly, if you don't do this, so it shouldn't be a big deal).

Failure to use an address for this parameter, that sendmail and the filter can identify will result in a message sending loop, whenever a message is sent to the archive. Fortunately, sendmail is smart enough quit the loop before it loops forever and fills the spool will billions and billions of messages, but it will deliver about 24 copies of each one before it gives up for taking too many hops. It is advisable to test the filter carefully, after setting this parameter, in a non-production environment with a single test message. You should be able to send a message directly to the archiver's email address and have it not be filtered. Then, you should be able to send a message to any user (whose email is being archived) and have a duplicate show up in the archiver's mailbox. You can easily test this after you've set up the archiver's mailbox but before you tell the archive program to catch all mail delivered to its mailbox. This will allow you to use the regular mail command to see what is being delivered before everything becomes automagic.

Duplicated messages are sent to the archiver exactly the way they are delivered to sendmail. In all probability, the lawyers will want to be pouring over each one with a fine tooth comb so it is important that each message appear exactly as sent. Thus, each duplicated message contains the entire original email message: headers; straight text; and MIME encoding. Furthermore all messages can be archived, including those that bypass normal filtering when checks such as the internal/external/transit checks controlled by the "FilterExternal", "FilterInternal" and "FilterTransit" options indicate that a message should not be filtered.

Note that archiving messages using this option is somewhat of a resource pig. The entire message is buffered in memory before it is sent to the archiver. Not the least of which, one reason for doing it this way is to allow us to send everything to sendmail at once, without having to worry about sendmail/milter errors leaving the pipe hanging. It also obviates the need to flush the pipe or try to retrieve half-sent messages if an error that we catch ourselves occurs. It was felt that this reliability was important and, in this day and age and, memory is so cheap and so abundant that, even with 10MB email messages, the performance and reliability benefits were well worth it.

Also note that this parameter is mutually exclusive with the "ArchiveDirectory parameter". Whichever one comes last in the config file wins.

IgnoreBounces If this parameter is present in the global configuration file, the filter will look for bounceback messages from sendmail and/or any other MTAs that send them through this system. Bounceback messages are sent by an MTA when it tries to deliver a message but, for some reason, the recipient is unavailable. The MTA sends a new email message back to the original sender informing them of the problems encountered while trying to deliver their message.

This option is included mainly to support certain email archiving schemes which send messages to be archived to an archive program through regular email delivery channels (e.g. Scalix uses this scheme). The MTA that is archiving its messages makes a copy of each message, alters the to address to point to the archive program and injects the message into the regular email delivery stream. All well and good, providing the archive program stays alive and makes itself available. However, should it become unavailable (e.g. through a network failure), the regular email delivery stream necessitates that a bounceback message be generated. Since the from address of the duplicated message was never altered, the bounceback goes back to the original sender of the message. This probably comes as a huge surprise to them.

When a bounceback message is identified (bouncebacks, by convention all come from a common source which makes it possible to identify them), the list comprised of all the address enumerated by IgnoreBounces options, will be consulted. If the recipient of the bounced message, as determined by the address mentioned in the delivery report of the bounceback message, matches this option's value, it will be cancelled and never delivered. The option value may contain file name globbing style patterns. For example: "*@archiver.{com|org|net}". It may be repeated as many times as necessary to list all of the addresses to be bounced.

archive_to Normally, by default, all messages passing through the filter are archived, if archiving is turned on. Using this option, it is possible to fine tune whose messages are actually archived. If this option is present, only those addresses explicitly mentioned in one or more "archive_to" parameters will be archived. All others will not.

If the recipient of a message matches this option's value, their messages will be added to the archive. Otherwise, their messages will be delivered in the normal manner but not archived. The option value may contain file name globbing style patterns. For example: "archiveme@*.*".

3.5 Spam Processing Options

Spam processing options are employed by the internal spam identifier that is invoked when the spam arbitron is chosen to be "internal" (see "-s" or "SpamProto") or an external spam arbitron (such as spamd) is chosen and the fast path is not turned off (see the "-ss" option or "SpamFastPath"). The "SpamRule" option specifies in a logical expression how the results of the various arbitrons are interpreted, if the default rule is not sufficient. The spam delivery options are applicable to all messages recognized as spam, regardless of how recognition is accomplished.

Either of the configuration files can contain white lists, black lists and/or spam delivery options. By properly tuning the global options file with black list information about the domains from which you constantly receive spam, you can build an effective spam blocking filter at very little runtime cost. By having your users update their local options files with black or white list information, they can further tune the behavior of the built-in spam identifier. Here are the spam processing options available in the configuration files:

IgnoreSpam * Yes|No   Turn on or off the filtering of any spam found in messages for this user. If this flag is set, spam will be ignored even if it is found. Be aware that the "SumOptions" or "-os" flag has no effect on this flag. Rather, the setting of this flag, whether on or off, is applied to each user individually.

Note that, for efficiency sake, there is only one copy of any modified email message. This being the case, it is possible that a message containing a warning about spam, which is sent as well to multiple users who wish to be warned about both viruses and spam, whould be sent to a recipient who has "IgnoreSpam" set to "Yes" and "IgnoreVirus" set to "No". This recipient will see the spam notification, despite the fact that they've requested not to. In this case, the wishes of the many outweigh the wishes of the few.

SpamAlways * Yes|No   Always add the spam report to mail, regardless of whether it is spam or not. The global option may be overridden by the "-sa" command line option.
SpamDel[ivery] * C[orral]|D[eliver]|T[rash]   Set the spam delivery mode to one of the three choices shown. The global option only may be overridden by the "-sc", "-sd" and "-st" command line options.

The "Corral" mode will cause any spam received to be sidelined in the spam corral, where it can be processed by a spam notification program and/or released for delivery by the recipient at a later date.

The "Deliver" mode will cause any spam received to be marked as such and then delivered to the recipient without further delay.

The "Trash" mode will cause any spam received to be tossed directly in the trash can, straight away. I like it! Unfortunately, the default is "Corral"

SpamFast[Path] * No|Yes   Turn off or on the spam fast path check, prior to calling the designated spam arbitron. The statistical spam fast check is turned on or off by this flag, too. If the designated arbitron is "internal" this flag has no effect, since it is possible to turn off the use of the internal check as the designated arbitron by not specifying it at all. The global option only may be overridden by the "-ss" command line option.
SpamLevel * 1-3   Set the level of spam reporting to the number chosen (a value of 1 through 3). The global option only may be overridden by the "-s1" thru "-s3" command line options.

Level one (the default) causes a single header line to be inserted in any message that is found to contain spam, giving the spam processing statistics as three percentage values.

Level two causes the same header line described under level one to be inserted in any message that is found to contain spam. It also inserts any headers generated by the spam arbitron (selected by "SpamProto") into the message.

Level three does everything that levels one and two do, plus it formats any report generated by the spam arbitron for insertion into the message body as a paragraph of text.

SpamProto[col] * none|internal|spamd   The communications protocol and port to use in talking to a spam arbitration daemon. The protocol must be one that this filter knows about and the port must match the one that the daemon will be listening on.

Currently, the supported protocols are:

internal - The internal, fast path spam checker.
spamd - The Spam Assassin daemon.

Except for the "internal" protocol, the port number and host name or IP address must follow the protocol name, separated by a colon and an at sign. So, the entire parameter should look like one of the following:

internal

or

proto:port@host

For example, if you are running spamd and it is listening on port 2527 on the local machine, the value for SpamProto should be:

SpamProto spamd:2527@localhost

By supplying a valid protocol and port, you will cause all messages received to be sent to the spam arbitration daemon. If it thinks the message is spam, it will be treated as such. If you don't supply this parameter, only the internal spam checks will be performed (blacklists/whitelists and statistical).

Note that when any spam arbitron is used, the internal, fast path spam checker is always run (unless the "-ss" option is set or "SpamFastPath" is set to "No"), prior to calling the arbitron selected, in an attempt to speed up spam checking by bypassing the overhead involved with calling the arbitron.

An optional timeout value may be supplied for any of the arbitrons chosen. If this parameter is supplied, it must follow the protocol, port number and host name immediately and be enclosed in parenthesis. For example:

SpamProto spamd:2527@localhost(20)

The value given is the time, in seconds that MailCorral is prepared to wait for the entire transaction with the spam arbitron. Essentially, this is the total time for the arbitron to respond to the spam determination request -- it includes the elapsed time for all of the reads and writes to the arbitron. In the above example, the time allotted to SpamAssassin would be twenty seconds.

The default timeout is set to 30 seconds, if no value is supplied. You may pick any positive value but consider this. Sendmail is only prepared to wait for an answer from the filter for so long. If the arbitron timeout is to be of any use, it should be set to some value smaller than the timeout given to sendmail in its configuration file (no value in the sendmail configuration file means a default timeout of 10 seconds).

A good choice would be to give the arbitron 60-80% of the timeout allotted to the filter in the sendmail configuration file. This will ensure that the filter can still complete the job of filtering a message in the time allowed, despite the fact that the arbitron times out while making its decision. And, if a virus arbitron is also being used (see the "-v" option or "VirusProto"), the time allocated to both the arbitrons should be split between them, as you see fit. In that case a good choice would be to give 30-40% to each of the two arbitrons.

Note that a time limit may be supplied to the internal arbitron but there isn't much reason for this, since it executes extremely quickly. When calculating how much time to allow the filter and its arbitrons, you can neglect the time consumed by the internal arbitron.

An empty parameter value may be specified in the local (user) options file to turn off a spam arbitron that was turned on globally. In the global options file, this parameter cannot be empty. The global option only can be overridden by the "-s" command line parameter.

SpamReplies 0-25   If spam filtering is done, the number of replies that should be sent to a spammer (per day), telling them that what they are sending is spam. Once the threshold is reached for a particular spammer in a single day, all subsequent messages from them are just dumped with no reply. The default is 0 (i.e. always just dump all spam). The maximum is 25. May be overridden by the "-r" command line option.

This option is necessary because some spammers use autoresponders to reply to any mail sent to them. This may well appear as spam too, in which case, a reply will be sent to it. Do you see the potential for ping pong? I do.

SpamRule * expression   An expression that specifies the rule used to decide whether a message is considered spam or not. If the expression evaluates to true, the message is spam. If it evaluates to false, it is not spam.

Normally, this expression is not specified and the system makes up a rule of its own, depending on the values chosen for "SpamFastPath" and "SpamProtocol". For example, the system typically uses:

(SpamFast != 1) && ((SpamFast > 1) || (SpamStats >= 100) || (Spamd >= 100))

You may specify your own rule, if you wish to change the order in which the spam checkers are called, alter the weighting of the various spam checkers or invoke a programmable spam arbitron of your own choosing (see ProgArbCode). And, in the user configuration file, you may specify this option without any rule to turn off any global spam rule and revert back to the system's default rule.

The expression supplied should be a logical expression that evaluates to either true (message is spam) or false (message isn't spam). If it contains spaces for readability, it should be enclosed in double quotes.

There are a number of predefined variables and constants that you can use to build the rule. These are:

Spamd The result from calling spamd (typically SpamAssassin). The results are normalized so that any number greater than or equal to 100 is usually considered spam. Using this variable causes spamd to be called.
 
SpamFast The result from calling the spam fast check. The results are: 0 for OK; 1 for deliver the message (i.e. it is whitelisted); 2 for local options have determined the message is spam (i.e. the sender is in the user's blacklist); 3 for global options have determined the message is spam (i.e. the sender is in the global blacklist). Using this variable causes the spam fast check to be called.
 
SpamStats The result from calling the statistical spam check. The results are normalized so that any number greater than or equal to 100 is usually considered spam. Using this variable causes the statistical spam check to be called.
 
SPAM_OK The OK return code from any of the programmable spam arbitrons (depends on the variable used). May be used like this:
 
      MyArb == SPAM_OK
 
SPAM_FOUND The spam found return code from any of the programmable spam arbitrons (depends on the variable used). May be used like this:
 
      MyArb == SPAM_FOUND

Any variable that is not in the list above causes the programmable arbitron code supplied by you (see ProgArbCode) to be invoked and passed one or more components of the message (see ProgArbRequires). The result from the programmable arbitron is set into the variable, once the arbitron returns. Programmable arbitron results can be one of: SPAM_OK or SPAM_FOUND. You might include your own programmable arbitrons in the spam rule like this:

(SpamFast != 1) && ((SpamFast > 1) || (SpamStats >= 100)
|| (Spamd >= 100) || (MySpam1 == SPAM_FOUND)
|| (MySpam2 == SPAM_FOUND))

Note that the spam rule is evaluated in the usual manner with the first term that renders the expression unequivocally true or false causing it to exit without calling any of the other arbitrons. Thus, you can use the order of evaluation to optimize your spam checking (e.g. in the above expression, the programmable arbitrons are only invoked if the two internal spam checks and spamd decide the message isn't spam).

StatEmbedRatio * 50, 0-1000   If statistical spam filtering is carried out by the internal fast path spam filter, this parameter sets the weighting and threshold for the ratio of embedded comments or escape sequences to words in the message, for the message to be considered spam. The value is in parts per thousand. The message's embedded ratio is multiplied by 100 and divided by the value given. This will cause any value above the threshold value to yield a spam value of 100%. A value less than the threshold may still lead to a determination of spam, since all of the spam values are summed. A value of zero disables this spam check.

Since any occurrence of embedded comments or escape sequences in words is an excellent predictor of spam, the default threshold is set fairly low at 50 embeds per 1000 words.

StatHREFRatio * 200, 0-1000   This parameter sets the weighting and threshold for the ratio of HREFs to words in the message, for the message to be considered spam, if statistical spam filtering is carried out by the internal fast path spam filter. The value is in parts per thousand. The message's HREF ratio is multiplied by 100 and divided by the value given. This will cause any value above the threshold value to yield a spam value of 100%. A value less than the threshold may still lead to a determination of spam, since all of the spam values are summed. A value of zero disables this spam check.

A large number of HREFs (links to Web pages) in a message is a good indication that a message may be spam, since spammers often link their message to a Web site with the actual content or to tracking pages that update their database when spam is read. Keep in mind, these are actual references to Web pages in HTML tags, not references to them in the text of a message. Hence, someone sending a message that says, "Here's the URL you wanted ...", should not trigger this test. The default setting is a moderate 200 HREFs per 1000 words to allow a generous number of links for legitimate reasons but still contribute to the spam score if a message includes links.

StatImageRatio * 100, 0-1000   If statistical spam filtering is carried out by the internal fast path spam filter, this parameter sets the weighting and threshold for the ratio of images to words in the message, for the message to be considered spam. The value is in parts per thousand. The message's image ratio is multiplied by 100 and divided by the value given. This will cause any value above the threshold value to yield a spam value of 100%. A value less than the threshold may still lead to a determination of spam, since all of the spam values are summed. A value of zero disables this spam check.

Since occurrences of images in messages are often a good predictor of spam, the default threshold is set fairly low at 100 images per 1000 words. A special case is recognized where a message contains no words but just images. This is assumed to be image spam and is given a score of 100%. However, recipients of certain kinds of newsletters that are composed entirely of images or a high proportion of images may wish to disable this test or set a higher threshold.

StatTableRatio * 250, 0-1000   This parameter sets the weighting and threshold for the ratio of tables to words in the message, for the message to be considered spam, if statistical spam filtering is carried out by the internal fast path spam filter. The message's table ratio is multiplied by 100 and divided by the value given, which is in parts per thousand. This will cause any value above the threshold value to yield a spam value of 100%. A value less than the threshold may still lead to a determination of spam, since all of the spam values are summed. A value of zero disables this spam check.

Spammers often use large numbers of table tags in a message to format their message for visual effect (it is, after all, advertising) or even to hide the text of the message from a content-based identifier. Tables do have legitimate reasons to be in a message, however, so a moderate threshold of 250 tables per 1000 words is the default.

StatImageParmBoost * 10, 0-100   If statistical spam filtering is carried out by the internal fast path spam filter, this parameter gives the spam score boost value (in percent) for any images that have parameters. For each image with parameters, the boost value is added directly to the final spam score. A value of zero disables this spam check.

Although there are legitimate reasons to use parameters in an image, most images are usually fixed URLs. Spammers frequently include an empty or innocuous image in a message that includes parameters telling them who the recipient of the message is. If the message is read and the image fetched from their Web server, the spammer can update their database to keep track of all of the actual recipients of their spam. The default boost is a moderate 10% for each image with parameters.

StatLinkEmailBoost * 50, 0-100   This parameter gives the spam score boost value (in percent) for any link (in images or HREFs) that contain an email address, If statistical spam filtering is carried out by the internal fast path spam filter. For each link containing an email address, the boost value is added directly to the final spam score. A value of zero disables this spam check. It is important to note that the boost is not given to HREFs containing email addresses as the result of a "mailto:" parameter.

Email addresses in links or especially in images are used by spammers as a feedback mechanism to identify recipients of spam. If the message is read and the image fetched from their Web server, the spammer can update their database to keep track of all of the actual recipients of their spam. Since there is very little real reason for including an email address as a parameter in an image or link, the default boost is a high 50% for each image or link with an email address.

StatTextBase64Boost * 80, 0-100   If statistical spam filtering is carried out by the internal fast path spam filter, this parameter gives the spam score boost value (in percent) for any text or HTML MIME entities that are encoded Base64. For each text/HTML MIME entity that is so encoded, the boost value is added directly to the final spam score. A value of zero disables this spam check.

Perhaps there are legitimate reasons to encode plain text and HTML as if it were binary data but the most common use of this technique is to obscure spam and viruses from detection by scanners. This being the case, the default boost is an extremely high 80% for each text/HTML MIME entity that is so encoded.

all_spam_to * If the recipient of a message matches this option's value, they will always receive every message sent, regardless of whether it is spam or not. The option value may contain file name globbing style patterns. For example: "wantspam@*.*". It may be repeated as many times as necessary to list all of the addresses that should always have mail delivered to them, although there is no point to use it more than once in the local options file, since it only applies therein to the local user.
blacklist_from * If the sender of a message matches this option's value, it will be treated as spam. The option value may contain file name globbing style patterns. For example: "*@spammers.{com|org|net}". It may be repeated as many times as necessary to list all of the addresses to be blacklisted.
whitelist_from * If the sender of a message matches this option's value, it will never be treated as spam but will always be delivered instead. The option value may contain file name globbing style patterns. For example: "*@goodguys.{com|org|net}". It may be repeated as many times as necessary to list all of the addresses to be whitelisted.

Probably the best use for this option is in a user's local options file where it can override one of the global black list entries to allow mail from a user's favorite spammer to be delivered, despite the spammer's domain being on the global blacklist.

In addition to the sendmail filter's own configuration files, the filter will also process configuration files for certain spam arbitration daemons to support fast path detection of spam using the black and white lists of the daemon.

If the spam arbitron chosen is spamd, the local and global Spam Assassin configuration files are read to extract white and black list information. This information is then acted upon to determine if a message is spam, hopefully without invoking the arbitron. The Spam Assassin configuration files processed (and the order in which they are read) are:


~/.spamassassin/user_prefs Recipient local preferences.


/usr/local/etc/spamassassin/local.cf
/usr/pkg/etc/spamassassin/local.cf
/usr/etc/spamassassin/local.cf
/etc/mail/spamassassin/local.cf
/etc/spamassassin/local.cf
Installation overrides, set up by sysadmin.


/usr/local/share/spamassassin/60_whitelist.cf
/usr/share/spamassassin/60_whitelist.cf
System whitelist info, shipped with the product.

3.6 Virus Processing Options

The virus processing options determine whether the internal virus checker is to be invoked (when the virus arbitron is chosen to be "internal" via the "-v" or "VirusProto") or an external virus arbitron (such as clamd) is to be invoked. They also determine whether the internal virus checker is turned off when an external virus arbitron is used or run in addition to it (see the "-vv" option or "VirusChecker"). The "VirusRule" option specifies in a logical expression how the results of the various arbitrons are interpreted, if the default rule is not sufficient. Here are the virus processing options available in the configuration files:

VirusAlways * Yes|No   Always add the virus report to mail, regardless of whether it contains spam or not. The global option may be overridden by the "-va" command line option.
VirusCheck[er] * No|Yes   Turn off or on the internal virus checker, prior to calling the designated virus arbitron. If the designated arbitron is "internal" this flag has no effect, since it is possible to turn off the use of the internal checker as the designated arbitron by not specifying it at all. The global option only may be overridden by the "-vv" command line option.
VirusProto[col] * none|clamd|internal   The communications protocol and port to use in talking to a virus arbitration daemon. The protocol must be one that this filter knows about and the port must match the one that the daemon will be listening on.

Currently, the supported protocols are:

clamd - The ClamAV daemon.
internal - The internal virus checker.

Except for the "internal" protocol, the port number and host name or IP address must follow the protocol name, separated by a colon and an at sign. So, the entire parameter should look like one of the following:

internal

or

proto:port@host

For example, if you are running clamd and it is listening on port 2528 on the local machine, the value for VirusProto should be:

VirusProto clamd:2528@localhost

By supplying a valid protocol and port, you will cause all messages received to be sent to the virus arbitration daemon. If it thinks the message contains a virus, it will be treated as such. If you don't supply this parameter, only the internal virus checks will be performed (MIME type/attachment type matching).

Note that when any virus arbitron is used, the internal virus checker is always run (unless the "-vv" option is set or "VirusChecker" is set to "No"), prior to calling the arbitron selected, but its results will not necessarily cause the virus arbitron to be bypassed. Only in the case of MIME entities or attachments that the internal virus checker marks for deletion will the external arbitron be bypassed, and such instances are admittedly quite rare.

An optional timeout value may be supplied for any of the arbitrons chosen. If this parameter is supplied, it must follow the protocol, port number and host name immediately and be enclosed in parenthesis. For example:

VirusProto clamd:2528@localhost(20)

The value given is the time, in seconds that MailCorral is prepared to wait for the entire transaction with the virus arbitron. Essentially, this is the total time for the arbitron to respond to the virus determination request -- it includes the elapsed time for all of the reads and writes to the arbitron. In the above example, the time allotted to ClamAV would be twenty seconds.

The default timeout is set to 30 seconds, if no value is supplied. You may pick any positive value but consider this. Sendmail is only prepared to wait for an answer from the filter for so long. If the arbitron timeout is to be of any use, it should be set to some value smaller than the timeout given to sendmail in its configuration file (no value in the sendmail configuration file means a default timeout of 10 seconds).

A good choice would be to give the arbitron 60-80% of the timeout allotted to the filter in the sendmail configuration file. This will ensure that the filter can still complete the job of filtering a message in the time allowed, despite the fact that the arbitron times out while making its decision. And, if a spam arbitron is also being used (see the "-s" option or "SpamProto"), the time allocated to both the arbitrons should be split between them, as you see fit. In that case a good choice would be to give 30-40% to each of the two arbitrons.

Note that a time limit may be supplied to the internal arbitron but there isn't much reason for this, since it executes extremely quickly. When calculating how much time to allow the filter and its arbitrons, you can neglect the time consumed by the internal arbitron.

An empty parameter value may be specified in the local (user) options file to turn off a virus arbitron that was turned on globally. In the global options file, this parameter cannot be empty. The global option only can be overridden by the "-v" command line parameter.

VirusRule * expression   An expression that specifies the rule used to decide whether any of the components of a message contain a virus or not. The expression is evaluated once for each message component that is tested for viruses. If the expression evaluates to true, that component is deemed to contain a virus. If it evaluates to false, the component is deemed not to contain a virus. A virus free message only results when all tested components do not contain any viruses.

Normally, this expression is not specified and the system makes up a rule of its own, depending on the values chosen for "VirusChecker" and "VirusProtocol". For example, the system typically uses:

(VirusCheck == DEL_DELETE) || Clamd

You may specify your own rule, if you wish to change the order in which the virus checkers are called, or invoke a programmable virus arbitron of your own choosing (see ProgArbCode). And, in the user configuration file, you may specify this option without any rule to turn off any global virus rule and revert back to the system's default rule.

The expression supplied should be a logical expression that evaluates to either true (message component contains a virus) or false (message component doesn't contain a virus). If it contains spaces for readability, it should be enclosed in double quotes.

There are a number of predefined variables and constants that you can use to build the rule. These are:

Clamd The result from calling clamd (typically ClamAV). For each message component analyzed, clamd will return a value of true, if a virus was found within the component. Using this variable causes clamd to be called.
 
VirusCheck The result from calling the internal virus checker. The result is one of the "DEL_" codes described below. Normally, one would only want to mark the message component as containing a virus if the "DEL_DELETE" code were returned by the internal virus checker. The internal virus checker results are always generated in the normal course of filtering the message so there is no performance penalty associated with using this variable.
 
DEL_OK The OK return code from the internal virus checker or any of the programmable virus arbitrons (depends on the variable used). May be used like this:
 
      MyArb == DEL_OK
 
DEL_WARN The warning return code from the internal virus checker or any of the programmable virus arbitrons (depends on the variable used). Reject means that the message component should be kept as part of the message but a warning should be issued to tell the user to be careful when oppening it. May be used like this:
 
      MyArb == DEL_WARN
 
DEL_REJECT The reject return code from the internal virus checker or any of the programmable virus arbitrons (depends on the variable used). Reject means that the message component should be kept as part of the message but its determinant (e.g. file name extension) should be mangled so that it can't be accidentally opened by some well-meaning program. May be used like this:
 
      MyArb == DEL_REJECT
 
DEL_DELETE The virus found return code from the internal virus checker or any of the programmable virus arbitrons (depends on the variable used). May be used like this:
 
      MyArb == DEL_DELETE

Any variable that is not in the list above causes the programmable arbitron code supplied by you (see ProgArbCode) to be invoked and passed one or more components of the message (see ProgArbRequires). The result from the programmable arbitron is set into the variable, once the arbitron returns. Programmable arbitron results can be one of: DEL_OK or DEL_DELETE. You might include your own programmable arbitrons in the virus rule like this:

(VirusCheck == DEL_DELETE) || Clamd
|| (MySpam1 == DEL_DELETE)
|| (MySpam2 == DEL_DELETE))

Note that the virus rule is evaluated in the usual manner with the first term that renders the expression unequivocally true or false causing it to exit without calling any of the other arbitrons. Thus, you can use the order of evaluation to optimize your virus checking (e.g. in the above expression, the programmable arbitrons are only invoked if the internal virus checker and clamd decide that the message component doesn't contain a virus).

3.7 Programmable Arbitron Options

The filter contains a number of built-in message arbitration routines. It also knows how to call several well-known arbitration programs. These routines and programs (called arbitrons herein) are used to determine the status of a message (i.e. whether it is spam and/or whether it contains a virus).

To provide the flexibility of using other arbitrons (even including homemade ones) the filter supports the concept of programmable arbitrons. Using the SpamRule or VirusRule expression, the order of arbitron invocation and the handling of return values can be defined programmatically in the configuration files (either global or user). The variables used in these rules indicate which arbitrons are to be invoked. If the standard, system-defined variable names are used in a rule, the standard, system-defined arbitrons are invoked. If any non-standard variable names are used, the programmable arbitrons defined by these options are used.

The programmable arbitron options (except for ProgArbLog) may only be set in the global options configuration file. This is because these options supply code that is actually executed by the filter so, aside from the obvious security repercussions, do you really want every Tom, Dick and Harry being able to supply code to the filter that it can execute (I thought not)? However, once a programmable arbitron and its corresponding variable is made available via these global options, any user can include it in their local rules to cause that arbitron to process their messages. Here is the list of programmable arbitron options available in the configuration files:

ProgArbCode filename[(timeout)]   The filename given must be the name of a file that contains a chunk of Perl code which will be executed whenever a programmable arbitron is to be invoked (see the Programmable Arbitrons Chapter). This code can be passed one or more sections of the message being filtered (depending on the options set by ProgArbRequires) and must make a go or no-go decision on whether the message is spam and/or contains a virus.

Note that the filename can only be set in the global configuration file. However, this option can be specified without any filename in the user configuration file to turn off programmable arbitron processing for that user only.

An optional timeout value may be supplied in parenthesis to give the amount of time, in seconds, that MailCorral should be prepared to wait for the each complete transaction with any one of the programmable arbitrons. Essentially, this is the total time for the arbitron to respond to the arbitration request -- it includes the elapsed time for all of the reads and writes to the arbitron.

The default timeout is set to 30 seconds, if no value is supplied. You may pick any positive value but consider this. Sendmail is only prepared to wait for an answer from the filter for so long. If the arbitron timeout is to be of any use, it should be set to some value smaller than the timeout given to sendmail in its configuration file (no value in the sendmail configuration file means a default timeout of 10 seconds).

A good choice would be to give the programmable arbitron 10-20% of the timeout alotted to the filter in the sendmail configuration file. This will ensure that the built-in and well-known arbitrons that the filter typically calls will have time to do their jobs and still allow the filter to complete the job of filtering a message in the time allowed.

ProgArbLog * [filename]   Since programmable arbitron code will indubitably require debugging, this option provides a method whereby the user can specify a log file which will be managed by MailCorral on behalf of the programmable arbitrons. The file will be opened whenever a programmable arbitron is called and a handle to it made available. The user can write trace information to this file in order to debug their programmable arbitrons.

Note that this option, without a filename supplied, in the user's configuration file, will turn off logging of programmable arbitrons for that user only.

ProgArbReq[uires] flag[, flag[, ...]]   This option specifies one or more flags that indicate which sections of the email message the programmable arbitrons require to be passed to them. The sections indicated are passed to all of the programmable arbitrons and the mechanism used for passing them is to write each section to a file so think carefully about which ones you really need. Here are the acceptable flags:

SpamHeaders The message headers, preened to make spam processing easier.
 
SpamBody All text portions (including text attachements) of the message, concatenated together and preened to spam processing easier (none of those bogus rules about how the attachments have funny separators or other such nonsense that Justin seems to like need apply -- just look at the text and see if it says Viagra).
 
MessageAll Both the message headers and the message body concatenated together, separated by a single blank line. Yep, it looks like a regular text email message. Scan away without giving it another thought.
 
AttachText All of the text attachements, one to a file. May be individually accepted or rejected. Excellent for virus scanning.
 
AttachAll All of the attachements, one to a file. May be individually accepted or rejected. Really excellent for virus scanning, if you can hack MIME decoding.

3.8 Message Formatting Options

Message filtering inserts text into filtered mail messages whenever it detects a virus or other suspicious entity or when the message is found to contain spam. The the text of these inserts is normally compiled into the filter via the "smfopts.h" file. However, the text of any of the inserts can be set from either of the configuration files. This allows the message inserts to be tailored at startup time via the global options file or at message delivery time, on an individual user basis, via the local options file.

The text inserts often have variable information substituted into them (the substitution type is indicated, where required). If this is the case, the rules for doing the substitution follow the conventions used by the C function sprintf. Otherwise, the text inserts themselves must be strings that begin and end with double quotes. If a second quote appears after the first, accumulation of the text insert is ended until another quote appears. The standard escape sequences "\"", "\\", "\r", "\n" and "\t" are recognized. The sample in the Sample Configuration File Section shows all this.

Each text insert is usually inserted as paragraph in the mail message. The insert should be formatted through the use of "\r\n" pairs to read properly if it is inserted into a plain text message (i.e. no lines longer than approximately 76 characters). Do not use any HTML tags (they will be supplied for you by the filter).

Typically, options are entered into the config file on a single line. However, for these message options, the inserted message text can be longer than a single line so you may wish to split it over multiple lines. To do so, simply end each line with '\'. The actual text of the message should be enclosed in double quotes. Concatenation of subsequent lines is done if the preceeding line is terminated with a quote and the next line begins at some point with another quote (this permits indented lines without the indents being included in the message text). Note that no text insert can be longer than 1000 characters and, if you use continuations, no continued line can be longer than 255 characters. The sample in the Sample Configuration File Section shows how to do it.

Obvious uses for the message formatting options are to customize the text shown to users to give a company look and feel or to translate the text into different languages. Here is the list of message formatting options available in the configuration files:

MsgConfig * Describes attachments that are rejected because they can contain configuration information which may allow a virus to modify a recipient's system. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s file, which supplies directives to system\r\nconfiguration or management programs. This might allow your system to be\r\ncompromised by a malicious outsider. There should normally be no reason\r\nfor anyone to send files of this sort to you. Verify with the sender of\r\nthe item what their intent was in sending you this file and that its\r\ncontent is safe before opening it.\r\n

MsgDelete * Indicates that the filter found what it knows to be a virus in the message. Requires no substitutions. The default message is:

SENDMAIL FILTER VIRUS ALERT: This mail message has been scanned by a\r\nsendmail filter and was found to include attachments that are known to\r\ncontain or that actually contain viruses. These items have been deleted\r\nfrom the message. Less malicious items have been rendered relatively\r\nharmless by renaming them. The summary below describes all the attachments\r\nand explains how they are harmful. In the case of the renamed items, if\r\nyou are sure you know who they are from and that they are indeed harmless,\r\nyou can rename them back to their original name and open them. The\r\ndeleted files may only be retrieved by following the instructions for\r\nretrieving the original message. Please proceed with extreme caution.\r\nIf you have any questions or would like assistance, please contact\r\nTECHSUPPORT.\r\n

MsgExec * Describes attachments that are rejected because they are executable and therefore will allow a virus to modify a recipient's system. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s file, which contains code that is executed\r\nas soon as you open it. This code can do whatever it likes, including\r\nwiping your hard drive, sending sensitive information back to its\r\noriginator and installing viruses on your machine. Verify with the sender\r\nof the item that its content is safe before opening it.\r\n

MsgExploit * Describes attachments that are rejected because they may be able to exploit a security hole in the application which usually handles them. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s file, which may be able to exploit a\r\nvulnerability or security hole in the application which normally\r\nhandles it, thereby inducing the application to unintentionally execute\r\nvirus code in the file. This code can do whatever it likes, including\r\nwiping your hard drive, sending sensitive information back to its\r\noriginator and installing viruses on your machine. Verify with the sender\r\nof the item that its content is safe before opening it.\r\n

MsgFoundItem * Names an attachment found that matches a filter criteria. Requires that two substitution (%s, %s) be present, the first for the attachment's name and the second for attachment's type. The default message is:

Found item %s, matching %s.\r\n

MsgFoundType * Names an inline MIME type found that matches a filter criteria. Requires that one substitution (%s) be present for the inline MIME type's type. The default message is:

Found inline MIME type %s.\r\n

MsgHTML * Indicates that embedded HTML was found in and removed from the message. Requires no substitutions. The default message is:

Embedded HTML was included in the message. Since most mail readers open\r\nand interpret HTML immediately, in a rather indiscriminate fashion,\r\neverything but the really innocuous tags have been removed. Hopefully,\r\nwhat remains should still be viewable yet be rendered harmless.\r\n

MsgHTMLScript * Indicates that probably harmful embedded HTML was found in and removed from the message. Requires no substitutions. The default message is:

Potentially harmful embedded HTML was included in the message. The harmful\r\ntags have been removed. What remains will not execute in the way that the\r\nsender intended but that is the price paid for making it harmless.\r\n

MsgInlineMIME * Describes inline MIME types that are rejected because they may be opened and executed immediately by many mail readers. Requires no substitutions. The default message is:

Inline MIME types are probably not harmful but it is possible they may\r\nbe so. Since many mail readers will open and interpret inline MIME\r\ntyped objects immediately, you have no chance to verify that the typed\r\nitem is harmless before it is opened. Innocuous MIME types are allowed\r\nby this filter but the ones listed above are either unknown or known\r\nto be potentially harmful. Consequently, the inline item has been\r\nrendered harmless (i.e. unopenable).\r\n

MsgMacro * Describes attachments that are rejected because they may contain macros that can be executed by the application which is normally associated with them. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s, which may can contain harmful macros\r\nthat are executed as soon as you open it. Verify with the sender of the\r\nitem that its content is safe before opening it.\r\n

MsgNest * Indicates that the filter encountered an error while processing the mail message. Requires no substitutions. The default message is:

SENDMAIL FILTER PROCESSING ERROR: This mail message was being scanned by a\r\nsendmail filter when a processing error occurred. It is entirely possible\r\nthat it might include attachments that are known to contain or are highly\r\nlikely to contain viruses. Had these items have been found, they would\r\nhave been rendered relatively harmless by renaming them. Unfortunately,\r\n due to the processing error, the renaming process might not have been\r\n entirely completed. The summary below describes which files were renamed\r\nand suggests how they might be harmful. You should proceed with extreme\r\ncaution when opening any of the attachments included with this message.\r\nMeanwhile, to report this filter failure or if you have any questions or\r\nwould like assistance, please contact TECHSUPPORT.\r\n

MsgNonSpam * Indicates that the message didn't contain any spam but that a spam report was requested anyway. Requires no substitutions. The default message is:

This message did not match the criteria for spam, determined by your\r\npersonal spam filter parameters or the global or system spam filter\r\nparameters but you asked for a spam report always, so here it is.\r\n

MsgReject * Indicates that the filter found what it thinks is a virus in the message. Requires no substitutions. The default message is:

SENDMAIL FILTER VIRUS ALERT: This mail message has been scanned by a\r\nsendmail filter and was found to include attachments that are known to\r\ncontain or are highly likely to contain viruses. These items have been\r\nrendered relatively harmless by renaming them. The summary below describes\r\nthem and suggests how they might be harmful. If you are sure you know who\r\nthey are from and that they are indeed harmless, you can rename them back\r\n to their original name and open them. Please proceed with extreme caution.\r\nIf you have any questions or would like assistance, please contact\r\nTECHSUPPORT.\r\n

MsgRemailInst * Indicates that the filter modified the message, for whatever reason and that a copy of the message is available for automatic remailing. Gives instructions about how to have the message remailed. Although there are two substitutions (%s, %s) present in the default message, both are for the name of the saved copy. Should you wish, your message may contain only one substitution. The default message is:

The original content of this message will be available for a short period\r\nof time by sending a message to REMAILROBOT and including the file name\r\n%s\r\nin the subject line. If your mail reader supports it, you can click on\r\nthe link: mailto:%s\r\nOtherwise, you will have to make up the message with the address and\r\nsubject yourself. Please bear in mind that the message may quite possibly\r\ncontain a virus but no warning will be given.\r\n

Further processing of the inserted text is done for HTML messages. If the inserted text contains the string "mailto:...\r\n", as does the default message, this string will be duplicated inside a set of link tags so that the recipient may click on the link and the message will be sent to the remail processing robot. If you wish to take advantage of this feature, the text of your message must include a string that begins with "mailto:" and ends with "\r\n". You could for example use the following message:

The original content of this message will be available for a short period\r\nof time by clicking the link:\r\nmailto:%s\r\n

MsgSaveLoc * Indicates that the filter modified the message, for whatever reason and that a copy of the message is available from Tech Support. Indicates that the recipient should contact Tech Support and names the saved copy. Requires one substitution (%s) for the name of the saved copy. The default message is:

The original content of this message will be available for a short period\r\nof time from TECHSUPPORT. Contact them and give them this file name:\r\n%s\r\n

MsgSpam * Indicates that the message contains spam. Requires no substitutions. The default message is:

This message matched the criteria for spam, determined by your personal\r\nspam filter parameters or the global or system spam filter parameters.\r\n

MsgSpecWarn * Describes attachments that might be harmful. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s, which are not normally harmful but it is\r\nremotely possible that it may be so. Verify with the sender of the item\r\nthat its content is safe before opening it.\r\n

MsgSuspect * Describes attachments that are rejected because they are strongly suspected of containing a virus, due to the way that the sender has tried to obscure their presence. There is one insert (%s) available in the message for the name of the attachment found. The default message is:

Found %s, a file which appears suspicious.\r\nVerify with the sender and TECHSUPPORT before opening this item.\r\nYou will need to rename the attachment to its proper name before opening\r\nit. You may also want to run additional virus scans against the\r\nattachment, in the mail reader's attachment directory, before opening it.\r\n

MsgTagged * Describes attachments that are rejected because they can contain tags that cause code to be executed as soon as the attachment is opened. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s file, which may contain tags that cause code to be\r\nexecuted as soon as you open it. This code can do whatever it likes,\r\nincluding wiping your hard drive, sending sensitive information back to\r\nits originator and installing viruses on your machine. Verify with the\r\nsender of the item that its content is safe before opening it and bear in\r\nmind that tagged data of this nature is frequently used to deliver viruses\r\nso use extreme caution.\r\n

MsgUnknownItem * Names an attachment found that doesn't match any filter criteria. Requires that one substitution (%s) be present for the attachment's name. The default message is:

Found unknown item %s, no filter criteria.\r\n

MsgUnknownWarn * Describes attachments that are of an unknown type which is probably not harmful but may be so. Requires no substitutions. The default message is:

A file of unknown type which is probably not harmful but it is possible\r\nthat it may be so. Verify with the sender of the item what the file is\r\nand that its content is safe before opening it.\r\n

MsgUnnamedItem * Indicates that an attachment was found that is unnamed and doesn't match any filter criteria. Requires that no substitutions. The default message is:

Found unnamed item, no filter criteria.\r\n

MsgVirus * Describes attachments that are rejected because they have a history of being virus delivery vehicles. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s, which has frequently been chosen as a\r\nvirus delivery vehicle in the past and is almost certain to contain a virus.\r\nVerify with the sender and TECHSUPPORT before opening anything. You\r\nwill need to retrieve the original message in order to obtain any\r\nattachments that you wish to proceed with opening.\r\n

MsgVirusFound * Indicates that a MIME entity or attachmed file was found to actually contain a virus, by the virus scanner. One substitution (%s) must be present, which will be replaced by the entity type, followed by the word "entity", or the attachment's name preceded by the word "file". The default message is:

Virus found in %s.\r\n

MsgVirusScanned * Describes attachments and/or MIME entities that are rejected because they were found to contain a virus. Thre are no substitutions. The default message is:

The MIME entities and/or files listed above were scanned by a virus\r\nscanner and found to contain actual viruses. Do not, under any\r\ncircumstances, unless you are absolutely certain you know what you are\r\ndoing, open any of them. Verify with the sender and TECHSUPPORT\r\nbefore opening anything. You will need to retrieve the original message\r\nin order to obtain any attachments that you wish to proceed with opening.\r\n

MsgWarning * Indicates that the filter found HTML tags, which it thinks might be harmful, in the message. Requires no substitutions. The default message is:

SENDMAIL FILTER WARNING: This mail message has been scanned by a sendmail\r\nfilter and was found to include objects and/or HTML tags that may be of\r\na malicious nature. The summary below describes them and suggests the\r\naction you should take with respect to them. If you have any questions or\r\nwould like assistance, please contact TECHSUPPORT.\r\n

MsgWordHandled * Describes attachments that are rejected because they are erroneously interpreted by MicroSoft Word on some systems. Since Word will execute macros in the attachments, the attachments are rejected. Requires that one substitution (%s) be present for the attachment's type. The default message is:

A %s, which on some systems may be handled\r\nincorrectly by Microsoft Word. This would allow a malicious person to\r\nsend an attachment, with this file type, that was actually a Word\r\ndocument. Since Word would handle this document instead of the intended\r\napplication, this would present a backdoor through which harmful macros\r\nthat are executed as soon as you open the attachment could be sent.\r\nVerify with the sender of the item that its content is safe before\r\nopening it. You may also need to rename the item to its proper name,\r\nif your email program renames it to \".doc\".\r\n

3.9 Sample Configuration File

What follows is an example of a fairly typical local (user) configuration file. It turns on or off some filtering and spam options that aren't usually set that way by default. In the interests of brevity, it changes the text of most of the message inserts to be less verbose. Finally, it blacklists all messages from a particularly nasty spammer.

 
#
# Filtering options.
#
AutoRemailing
ReplaceHTML No
SpamAlways
SpamDelivery Deliver
SpamLevel 3
#
# Filtering messages.
#
MsgWarning \
  "SENDMAIL FILTER WARNING: This mail message has been scanned by a\r\n"\
  "sendmail filter and was found to include objects and/or HTML tags that\r\n"\
  "may be of a malicious nature.  The summary below describes them and\r\n"\
  "suggests the action you should take with respect to them.\r\n"
MsgReject \
  "SENDMAIL FILTER VIRUS ALERT: This mail message has been scanned by a\r\n"\
  "sendmail filter and was found to include attachments that are known to\r\n"\
  "contain or are highly likely to contain viruses.\r\n"
MsgSaveLoc \
  "The original content of this message will be available for a short\r\n"\
  "period of time in %s.\r\n"
MsgRemailInst \
  "The original content of this message will be available for a short\r\n"\
  "period of time by sending a message to spamrobot mentioning:\r\n"\
  "%s\r\n"\
  "in the subject line.  Or you can click on the link:\r\n"\
  "mailto: spamrobot?%s\r\n"
MsgInlineMIME \
  "Inline MIME types are probably not harmful but it is possible they may\r\n"\
  "be so.  Consequently, the inline item has been rendered harmless.\r\n"
MsgVirus \
  "A %s, which has frequently been chosen as a virus\r\n"\
  "delivery vehicle in the past and is almost certain to contain a virus!\r\n"
MsgWordHandled \
  "A %s, which on some systems may be handled\r\n"\
  "incorrectly by Microsoft Word.\r\n"
MsgMacro \
  "A %s, which may can contain harmful macros\r\n"\
  "that are executed as soon as you open it.\r\n"
MsgSpecWarn \
  "A %s, which are not normally harmful but it is\r\n"\
  "remotely possible that it may be so.\r\n"
MsgExec \
  "A %s file, which contains code that is executed\r\n"\
  "as soon as you open it.\r\n"
MsgConfig \
  "A %s file, which supplies directives to system\r\n"\
  "configuration or management programs.\r\n"
MsgExploit \
  "A %s file, which may be able to exploit a\r\n"\
  "vulnerability or security hole in the application which normally\r\n"\
  "handles it.\r\n"
MsgTagged \
  "A %s file, which may contain tags that cause code to be\r\n"\
  "executed as soon as you open it.\r\n"
MsgHTML "Embedded HTML was included in the message.\r\n"
MsgHTMLScript \
  "Potentially harmful embedded HTML was included in the message.\r\n"
#
# Spam fast path stuff.
#
whitelist_from myfriend@hometown.org
whitelist_from theboss@workplace.org
blacklist_from *@spammers.com
blacklist_from *@badguys.com