SpamCorral Documentation
If you use a sendmail virus/spam filter, such as MailCorral (described elsewhere on this site), the filter should redirect all received spam to the spam corral, instead of delivering it to the spammer's intended victim. This method of never actually delivering a single piece of spam is a good way of dealing with it but one that could go astray when the filter stops a piece of spam which is actually valuable (stranger things have happened). It is even possible that a non-spam message might be misidentified as spam and stopped by mistake.
How do we resolve this dilemma? Consider that, in the end, the ultimate decision, about whether a piece of spam is valuable or not, is best left up to the intended recipient. After all, they are really the only ones who know whether they actually want so see the spammer's message or not. But, if the recipient is shown every piece of spam and asked to make a decision about whether they want to see it or not, the solution is no better than the problem. The compromise solution is to only ask them once or twice a day, in a single message, and to provide a summary that contains sufficient information to allow them to decide, quickly and easily, whether they want to see the spam or not.
The SpamCorral notification program (SpamNotify.pl) can be run periodically by a cron job, whereupon it will send email notification messages to all of the recipients of spam. It summarizes each of the messages received since the last time it was run, giving the sender's address, the subject, the delivery date/time and the associated spam statistics (as a percentage, with 100% being the threshold for classification of a message as spam). This provides, on a timely basis, yet without being annoying, enough information for a person to decide whether they would like to see any of the spam they have received. Normally, the answer is no, in which case, they need do nothing.
SpamNotify.pl scans through the email spam corral, searching for messages to send notifications for. Normally, it will find all of the messages that have arrived since the last time it was run (i.e. greater than the last ran timestamp). However, you may alter this behaviour by using one or both of its optional patterns to look only for specific messages.
If you do elect to use a pattern, the full flexibility of regular expression searching is available. The patterns given are evaluated using the Perl eval function. Consequently, if you can describe what you are looking for with a regular expression, you can find it in the spam corral.
If this is not sufficient, the even fuller flexibility of a block of code is available for the more difficult problems. If you can code your search criteria into a block of Perl code, you can apply them to messages in the spam corral.
For each spam message that matches the search criteria (either later than the last timestamp or one or both of the patterns), a notification message is sent to the recipient, indicating that there is spam waiting for them in the corral. A brief description of each spam message is given in the notification, so as to give the recipient a chance to decide whether they want to see it or not. All of the matched spam messages for a single user are sent in one notification.
The input to this script is the spam corral, which is expected to be populated by the MailCorral sendmail milter "sendmailfilter.c". Essentially, each message in the corral is stored exactly as it was received, except for a single line of statistical information in a special header at the very front of the message.
If you wish to use this program on mail messages corraled by some other mail filter, you should easily be able to alter the portion of the code that looks for the statistical information in the header.
The output will be one notification mail message sent to each
individual recipient of spam with messages that match the search criteria.
In the notification message, there will be an explanatory paragraph or two
plus one paragraph for each spam message. In these paragraphs, the sender,
recipient, subject, spam statistics and any annotation from a
The recipient of the notification will be able to use the
information therein to release any spam, that they particularly care about,
for deliver to themselves.
SpamNotify.pl is invoked by the following command line:
SpamNotify.pl [--Config=configfile] [--Corral=/corral/dir] [--Debug] [--Partition[=interval]]
The meaning and usage of each of these parameters is:
To use this file, set the variables
as you would in Perl code. For example:
$SPAMROBOT = "myrobot";
On many systems, the amount of spam
kept in the corral can approach epic proportions. For example, on a
system with 5000 users, that keeps its viruses/spam for 10 days, where the
users receive an average of 10 virus/spam messages per day, the corral
would contain 500,000 messages. This number of messages can tax any file
system and prove burdensome to process.
By partitioning the corral into
intervals, the total number of messages in a single corral directory can
be reduced significantly (e.g. in the above example with a partition
interval of 24, the average directory would only contain 50,000 messages).
Processing of corralled messages is
also speeded up, since the messages in a partition that don't apply to the
time period being scanned are bypassed altogether.
Note that partitioning can be
turned on/off at any time but it will take some time for the partitioned
messages to work their way out of the system, once they are partitioned.
Be sure, if you do turn on partitioning, that you have a version of
filterclean from MailCorral version 1.0.16 or better, otherwise the
partitions won't get cleaned up.
A regular expression is used, which
is matched against the userid that the spam was sent to. It is evaluated
by the Perl eval function. The leading and trailing forward slash must
not be supplied by you. These slashes will be supplied by this program
when it makes up the expression to be evaluated. Also, you need not worry
about case-sensitivity because case-insensitive mode is forced.
Note that special characters must
be escaped using the backslash character. Unfortunately, this is also the
shell escape character so the backslash must be doubled-up on the command
line, if unquoted. For example, to get the pattern shown, use:
myhost\\.com command line
Also note that, if the corral has
been partitioned by the "--Partition" option, the search will only extend
backwards from the current time for 30 days.
Note that, if you use this pattern,
the Also note that, if the corral has
been partitioned by the "--Partition" option, the search will only extend
backwards from the current time for 30 days.
The pattern can be either a regular
expression or a block of Perl code that returns true/false. Whichever is
used, both will be evaluated by the Perl eval function. If a regular
expression is used, the same rules that apply to If you wish to supply a block of
code to be evaluated, it must begin and end with "{" and "}". If it does
not, it will be treated as a regular expression and evaluated as such.
The last statement in the block of
code must evaluate to a boolean value. This value must be true, if you
wish the search to match the message in question and false, if you do not
wish the search to match the message in question.
The block of code has the following
pre-defined variables available to it:
$Message - the buffer
holding the entire message, including the headers, as it was sent. Note
that, if you don't use code, the pattern will be matching against this
variable.
$Headers - only the headers
portion of the message.
$Body - the body portion of
the message, minus any headers.
$From - the sender's from
address.
$To - the recipient's to
address.
$Subject - the subject
header line.
$Date - the message's date
header line. The format is according to the SMTP RFCs is:
Day, mm Mnt yyyy hh:mm:ss +/-gggg
Where "Day" is the name of the day
of the week, "mm Mnt yyyy" is the day of the month, month name and year.
"hh:mm:ss" is an hour timestamp and +/-gggg is the offset in hours and
minutes from GMT.
$ReplyTo - probably pretty
similar to $To. The reply-to header line.
$ContentType - the
content-type header line. This follows the MIME rules in the RFCs.
$Importance - the importance
header line.
$MatchCount - a running
count of the number of messages matched so far (does not include the
current message).
$TotalCount - a running
count of the number of messages processed so far (includes the current
message).
$Remarks - may
be set to any value by the block of code. This variable will be appended to
the notification generated for any matching message. Can be used to annotate
matched messages or to send information to the users.
2.3 Command Line Parameters
[--SearchSpan=days] <uid_pattern> <search_pattern>
--Config
The name of a file containing program configuration information. This
file is interpreted via the eval function so that it may contain variable
assignments, etc. All of the configuration variables described in the
"Configure the Message Remailer" section (e.g. $NOTIFYMSG) may be set in
this file. The default file is a file named "SpamNotify.cf" in the same
directory as this program. If the file doesn't exist, the internal
defaults are used instead.
--Corral
The name of the directory where all of the incomming the spam has been
corraled by the sendmail spam filter. Normally, the default directory, if
not specified, is "/var/spool/MailCorral".
--Debug
Turn on debugging. Dumps possibly helpful trace information to stdout if
enabled.
--Partition
Turn on corral partitioning. If no value is specified, the corral is
partitioned in 24 hour intervals. If a value is specified, it must be a
number from 1 to 24, giving the interval in hours into which the corral
should be partitioned. It is probably best to choose an interval that
divides evenly into 24 (i.e. 1, 2, 3, 4, 6, 8, 12 and 24).
--SearchSpan
When searching for a particular userid or pattern, this parameter tells
the notifier how many days it should back up, from today, while searching.
The default is 30 days.
<uid_pattern>
A userid pattern to use in searching the corralled spam for messages to
send notifications for. Normally, all messages received after the last
timestamp are processed. Supplying this pattern alters that behaviour.
Usually, it is used for testing purposes.
myhost\.com yields pattern
myhost.com searches for
<search_pattern>
A pattern to search the contents of each of the spam messages for. Any
message which contains this pattern will have a notification sent for it.
Normally, all messages received after the last timestamp are processed.
Supplying this pattern alters that behaviour. Usually, it is used for
testing purposes.