EMAIL FILTER VALIDATION SUITE
Most email viruses employ similar propagation techniques and
are, consequently, very similar in external appearance to one another. That
being the case, it is possible to design an email virus filter that removes
viruses based on outward appearance. This is highly desirable, since it
will ensure that, even when a new virus comes along, your filter will be
able to screen for it.
On the other hand, the penalty for failing to detect a virus
in an email message is high. How do you know if you've covered all the
bases by detecting all of the important propagation techniques? This page
will allow you to download a email filter validation suite that consists of
generic email messages which can be passed through your filter to test all
of the popular methods of virus propagation. If your filter detects each
message and handles it correctly, it is probably ready for prime time.
Although not nearly as malicious as virus writers, spammers
are sometimes equally sneaky. Since many installations have systems in
place to detect and eliminate spam (such as BSM Development's
this puts a crimp in the spammer's style. In order for spam to work, it
must be delivered to its target. Figuring out new ways to slip spam by
email filters is one of the things that spammers do periodically. This
validation suite also presents tests for some of the latest spam
Here is a list of the messages included in the test suite and
the filter criteria they are intended to test. The name of each test is
given as well.
The messages beginning with "baddoc_" generally test the
limits of mail filters and/or their ability to handle misbehaved or
- baddoc_text_plain - To begin with, a simple test message that
is just plain text. It doesn't contain any MIME entities, although it is
tagged with a "Content-Type" of "text". Your filter should pass this
message as a plain text message with no alterations.
By the way, don't laugh. For a while,
ClamAV was identifying certain text-only messages as containing a virus
and email filtering was flagging them as bad. So, having this test in the
validation suite makes sense.
- baddoc_attachment_ok - A message with
a number of small attachments, all of them supposedly innocuous. The
filter should ignore them and pass them untouched. Attachments included
are source code (e.g. C source), text files, images (e.g. GIF, JPEG, PNG),
audio files (e.g. wav, mp3), videos (e.g. MPEG, QuickTime), postscript and
PDF files, archive files (e.g. tar, gzip, zip), signatures and vcard
files. A couple of tests for ISO-8859-2 filename support are also
- baddoc_attachment_warn - A second
message with a number of small attachments, that could be harmful and are
at least suspect. The filter should probably give a warning about each of
them but otherwise pass them untouched. Attachments included are
Microsoft Word documents (including RTF), Powerpoint presentations, Excel
spreadsheets (including a CSV file), other spreadsheets, database files,
man pages and troff files, and a CGI file. Generally, these are
attachments that have been proven to be used for distributing malicious
code, scripts, macros, etc. in the past.
- baddoc_attachment_reject - A third
message with a number of small attachments, that are assumed to be
harmful. The filter should do something to render them harmless (e.g.
rename them so that they have extensions which are not automatically
opened) and give a severe warning about each of them. Attachments
included are executable files (e.g. with the extensions ".exe", ".com",
".dll", ".pif" as well as known bad file names), interpreted files (e.g.
plugins, certificates that can contain malicious HTML, and HTML itself.
Generally, these are attachements that are known to be executed upon
opening. A couple of tests for ISO-8859-2 filename support are also
Also, a dummy file that simply contains
all zeroes, named "Virus.scr" is included as a special test for filter
debugging, if you need it.
- baddoc_attachment_word - A message with
a single, large, attached document of dubious nature. Tests the ability of
a filter to handle large attachments properly (e.g. since sendmail's milter
spans messages in 64K-byte chunks) and to detect possibly harmful
- baddoc_attachment_only - A message may
contain a single attachment with no message text at all. This happens
when someone mails only the attachment with no explanatory text. In this
case, the attachment is a Microsoft Word document, which should be flagged
with a warning and/or removed.
However, this is a difficult case, since
there is no place in the original message to insert a warning, since it
has no text. A mail filter must be able to handle this test case by, at
least examining the attachment and deciding what to do with it. If it
deserves a warning, the structure of the message itself may have to be
reworked to add a text segment where the warning can be inserted. If it
should be rejected, the attachment should be deleted and the message
reworked to contain only a text component that indicates what was done.
malicious ones, whether it
Note that, in our test case, we also played
a few games with the MIME description of the attachment (as a virus might)
to give your filter an extra test for thoroughness. The attachment is
described as an octet stream, which is normally passed over, since it is
not automatically executed or opened by anything, as far as we know.
Encoding in Base64 also obscures the content of the attachment.
- baddoc_attachment_only_html - A
message may contain a single HTML attachment with no message text at all.
This is a special case of the baddoc_attachment_only test (above), since
many email clients use HTML in their messages to augment the user
experience (i.e. to improve on boring, old, safe, plain text). Some of
these mailers treat such attachments as simple messages, despite their
being marked as attachments. In particular, "text/html" and "text/plain"
entities that are given attachment names in the "Content-Type" header of
a message often have their attachment status ignored and are treated this
This presents a dilema for a good mail
filter because it may want to allow the attachment to go through
unscathed, since it is
an attachment. But, if a mail client
mistakenly treats the attachement as inline text, and proceeds to
interpret it, malicious HTML could sneak through and get executed. If
safety is your goal, your filter should probably examine these particular
kinds of attachments and alter or reject the malicious ones, instead of
just giving them a pass. This may prove to be inconvenient for some
users, who simply want to send an attached HTML file with no explanation,
but in this case, merely adding any kind of message text will move the
attachment out of inline and allow it to go through. A small penalty to
pay for safety first.
- baddoc_apple_double - The attachments
in this message contain dummy Apple resource forks which are sent, along
with the attachments that they describe, in pairs. Your filter should deal
with them as a single entity and not produce double warnings, etc., about
the resource forks themselves. Only the actual attachements should be
In this message, there are three actual
along with resource forks, and a separate word document that has no
considered sketchy, at best, your filter should produce three warnings, one
for each of the two Apple files plus their resource forks, and one for the
standalone Word document.
- baddoc_attachment_untyped - None of the
attachments in this message have an explicit file type (i.e. no file
extensions). If your filter's rejection criteria are based solely on file
extensions, it should warn you about all of them (i.e. that they have no
file type). Otherwise, if it can examine the MIME type as well as the file
type, the harmless MIME types should be ignored and the others should
generate a warning. The attachment types that are included are plain text,
postscript and PDF files, images (e.g. GIF, JPEG), a video (e.g. MPEG),
Microsoft Word documents (including RTF), a Powerpoint presentation, an
Excel spreadsheet, an octet stream, and a man page and troff file.
- baddoc_body_empty - Some mailers may
accidentally send MIME-encoded messages that have no message body. A
sample of one such message is included to test the robustness of your
filter (i.e. does it crash when the message is this broken).
- baddoc_forwards - Mail messages that
are forwarded from one user to the next often include multiple levels of
nested mail delivery information, and mixed HTML and plain text components.
This test message demonstrates such nesting of delivery information and
mixed content, and is provided to stress test a filter's ability to handle
multiple levels of MIME entities and apparently-confusing content.
- baddoc_html_straight - Many email
clients immediately interpret HTML in an effort to properly render
HTML formatted messages. That's good news for viruses, because this
indiscriminate rendering of HTML makes for a fine means of propagation.
HTML can appear as the only component of a MIME-encoded message, as in the
sample supplied, in which case it should be suitably laundered to remove
all harmful tags. Note that this example contains some tags broken in
typical ways used by virus writers to exercise your filter's HTML tag
- baddoc_html_mime - Another place where
HTML can appear is in a multipart, MIME-encoded message. Generally, most
mail clients can render HTML messages (as noted above). Only, if a mail
client is incapable of rendering HTML (e.g. Linux mail), will it display
the text portion of a multipart message. This being the case, most of
today's email includes HTML because it allows enhanced formatting of
messages. Since email clients will be rendering this content, it behooves
an email filter to suitably launder and remove all harmful tags in the
portion of the message that a mail client will view. Thus, this message
serves as a test of those abilities as well as a test of the appropriate
amount of restraint, since it contains HTML that should largely be left
- baddoc_html_embedded - Unfortunately,
some email readers are prepared to go the extra mile and interpret embedded
HTML, even when it occurs in a plain text message (i.e. not MIME-encoded).
This could have disasterous effects, if your filter does not look for HTML
tags in plain text messages. This message presents a sample of such
embedded HTML. Although it is less likely to be harmful, it should still
be processed (i.e. removed).
- baddoc_html_malformed - Not all HTML
in mail messages is particularly well-behaved. Much of it is malformed,
often on purpose. This message presents a test case that contains mangled
HTML that should, none-the-less, be processed and rendered harmless. The
mangling consists of HTML tags that span lines. Often, an email filter
will miss such tags, if it uses a simple scanner that doesn't account for
newlines. However, most mail clients will ignore newlines and process the
tags properly. The bad guys know this and intentionally split lines in
HTML messages to sneak by the email filter. If the scanner misses such
messages, it could be worse than a "Bad Day at Black Rock".
- baddoc_html_comments - On the other
hand, some HTML is harmless, although it appears not to be, because
apparently-harmful tags can be included inside comments. Various mailers
such as Word under Outlook use the technique of embedding formatting
information into comments in HTML messages. An email filter should be able
to handle this kind of embedded formatting so this messages presents such a
- baddoc_html_encoded_malformed - HTML
is often encoded to obscure its true nature from simple message filters.
This message contains poorly-behaved HTML, similar to that found in
baddoc_html_malformed, that is also Base64 encoded. Furthermore, it tries
to fool any mail filters that see it by using 304-character long lines when
normal Base64 encoded lines are only 60 characters long. The spammer who
crafted this message was using more than one trick. Despite that, it
should still be processed, rendered harmless, and then reencoded. This
tests both a filter's ability to deal with mangled and obscured messages
as well as it's ability to modify encoded messages.
- baddoc_html_uuencoded - Although
uuencoding has been largely surpassed by Base64 encoding, messages using
this encoding are occasionally received. This message has an HTML-only,
MIME-encoded body that contains HTML tags that should be removed by an
email filter. It tests the ability of the filter to decode uuencoded HTML,
launder it, insert a message into the HTML to tell the user what was done
and then re-encode it.
- baddoc_html_outlook_typical - Outlook
is a ubiquitous mailer that is quite capable of producing some real
hum-dingers when it comes to HTML encoded messages. If Word is set as
Outlook's text editor, much of content of messages generated by Outlook, in
HTML mode, are Word and HTML formatting tags. This message is a typical
example of what is possible. A good virus filter should reject none or
almost none of this message. It is verbose and it is filled with useless
junk but it isn't malicious or harmful.
- baddoc_html_attachment - Finally, it is
possible for HTML to be sent as an attachment to a message. Such attached
HTML cannot be altered, since we can only surmise its purpose, but a
warning should be issued about it. On the other hand, the actual text of
the message can be rendered in HTML, which can be altered. If it contains
anything bad, it should be removed, lest a hapless mail client try to
execute it. This message tests for this case by including a harmful HTML
tag (iframe) in both the message body and the attachement. The first
should be removed and the second retained.
- baddoc_inline_mime_ok - An email
message with a number of small, inline MIME entities (i.e. not
attachments), all of which are supposedly innocuous. The filter should
ignore them and pass them untouched. MIME types included are RFC822
headers, vcards, postscript and PDF documents, tar, compressed, gzipped
and zipped archives, GIF, JPEG, PNG, TIFF, BMP images, drawings,
signatures, PGP objects, audio recordings (e.g. wav, mp3), and videos
(e.g. MPEG, QuickTime).
- baddoc_inline_mime_warn - A second
message with a number of small, inline MIME entities, that could be
harmful and are at least suspect. The filter may decide to simply give a
warning about each of them but otherwise pass them untouched. But, apart
from the danger of allowing the various applications associated with
such MIME entities to open them, there is the added danger that an email
client will try to interpret them directly. Consequently, your filter may
decide to rename or otherwise alter these MIME entities to a MIME type
that will not get processed or executed automatically, just to be safe.
The inline MIME entities included in this
test message are Microsoft Word and RTF documents, Powerpoint
presentations, Excel spreadsheets (including a CSV object), other
spreadsheets, man pages, and troff documents. Generally, these are
inline MIME entities that are sometimes used for distributing malicious
code, macros, etc.
- baddoc_inline_mime_reject - A third
message with a number of small inline MIME entities, that are assumed to
be harmful. Email filters should reject these messages, the best way
being to rename the MIME type to one which is unknown, and will not get
interpreted by email clients, etc. Alternately, a harsher but highly
guaranteed solution is to have the email filter delete these inline
The inline MIME entities found in this
other kinds of scripts (e.g. TCL, batch scripts, shell scripts),
X509 certificates (which can include harmful embedded links that are
executed), and vcalendars.
- baddoc_inline_mime_only - A message
may contain a single inline MIME entitiy (i.e. not an attachment) with no
message text at all. This happens when someone mails only the MIME entity
with no explanatory text. Since some email programs might (or may we
say should) interpret this MIME entitiy directly and not treat it as an
attachment, it must be filtered separately from attachments. An email
filter must be able to handle this test case by at least examining the
MIME type and rejecting malicious ones, whether it actually generates a
warning message or not, as well as scanning the message for viruses and
As to whether a dispostion warning can be
generated, this is a tricky case, since the inline MIME entity cannot be
altered it will require making up another email message component and
putting the disposition warning in there, essentially rebuilding the email
message from the ground up.
- baddoc_multipart_misplaced - Some mail
handlers may generate messages with MIME components that don't conform to
the standard (as was discussed for baddoc_inline_mime_only, above,
sometimes knowing where and how to insert your disposition message is
hard). Regardless of this, many mail readers will still handle them. In
this case, a multipart/alternative section appears to be misplaced because
it is used solely for the purpose of enclosing a virus scanner's
certification. By the way, this message is known to crash one version of
the Perl module Email::StripMIME which is frequently used in email
filters, hence another reason for this particular test case.
- baddoc_multipart_missing - This
message contains an HTML component which includes an IFrame. Since
IFrames are a favorite way for the bad guys to launch viruses, etc., this
message should be flagged as bad by your email filter. That's all well
and good but this message tests whether your email filter can insert
warnings into badly formed messages. This message claims to be in
multipart/alternative form but the text portion of the alternative
message is missing. If all mail readers (including those that only look
at the text of multipart/alternative messages) are to see your filter's
disposition message, it should still be able to insert a message into the
HTML portion and, if it is really on the ball, fabricate the text/plain
component and insert its disposition message there, too.
- baddoc_html_overrun_encoded - In order
to detect spam/viruses in a message, an email filter may need to decode
encoded messages. This message contains an HTML component that has been
encoded to obscure it. The encoding has been done in violation of the
Base64 rules, thereby producing two excessively long encoded values (989
bytes and 711 bytes), probably to cause filters to blow up and miss the spam
inside the message. The resultant message, while technically invalid is
handled by many mail readers and so must also be handled by your filter.
- baddoc_html_overrun_straight - This
message contains an HTML component, that comprises a single line that is
very long (10314 bytes) in order to test if your email filter can handle
very long lines without blowing up.
Mail archivers are often implemented as a mail handling robot
to which regular email messages are simply sent by an MTA. The MTA simply
duplicates every mail message it sees and forwards the copy, through regular
channels, to the archiver. Should the archiver crash for any reason, the
messages sent to it may be bounced by the MTA. This often leads to the
bounces being sent back to the original sender, thereby causing much
confusion, since they never sent any message to the archiver.
To support mail archiving, your filter may wish to detect
bounce messages and trash those resulting from delivery failures to your
mail archive robot's address. The messages beginning with "bounce_" are
meant to test this ability in particular, as well as a filter's ability to
handle bounce messages in general.
- bounce_communigate_failure - A bounce message from a
- bounce_gmail - A bounce message from
- bounce_postfix_failure - A failure to
deliver, bounce message from Postfix.
- bounce_postfix_warning - A message
delayed status message from Postfix.
- bounce_sendmail_failure - A failure
to deliver, bounce message from Sendmail.
- bounce_sendmail_hops - A failure to
deliver, bounce message from Sendmail, caused by the message being relayed
through too many hosts.
- bounce_sendmail_warning - A delayed
mail message from Sendmail.
- bounce_spam - A piece of spam that
is disguised as a bounce message. This is a pretty common techinque used
by spammers. The spam content is usually in the bogus delivery report.
The messages beginning with "spam_" are designed to test the
spam classifier in your mail filter.
- spam_delivery_check - This message is a simple spam delivery
check message, with an obvious spam sender name. It can be useful when
developing an email spam filter, because the filter can simply check for
this name, mark the message as spam, and then you can check that the spam
is disposed of properly (i.e. delivered or not).
- spam_text, spam_html - Other
than analyzing a message's headers, the most important portions of a
message to examine for spam are its plain text and HTML components.
These two sample messages are pretty typical plain text and HTML messages
that contain spam.
- Spam may be sent in HTML format but as the text of a simple message. A
mail scanner should be able to handle this by looking for HTML in the text
part of the message. This test message (spam_text_html_paypal)
represents a typical phishing message from a well-known auction site that
should certainly be detected as spam. If your virus scanner is on the ball,
it may also detect the fake customer service (renewal) URL as a virus.
spam_html_encoded - As with the creators of viruses, the spammers
are constantly trying to "improve" their delivery methods so that spam
filters will not detect and reject their messages. Encoding the message is
one way of doing that. These two messages provide identical spam test
cases in both encoded plain text and encoded HTML form.
- One powerful technique for spam filtering is the concept of the black
and whitelist, which involves comparing the sender's address against a list
of acceptable addresses. Spammers may try to bypass a scanner's list checks
by obscuring the address using quotes, comments, etc. This test case
(spam_text_addr_obscured) contains a spam, text message that will
probably pass through most spam scanners undetected. One would hope that the
black/whitelist catches the message but its "From" address has been obscured
to test the limits of your scanner's address parser.
- spam_html_long - Some spammers may try
to use long, encoded HTML to cause a mail filter to give up or crash,
thereby causing their message to be bypassed and delivered to the recipient.
This message simulates such a case, where the HTML is spanned over many
lines but, when decoded, turns into a single, excessively long line.
- The spammers are a bunch of guys who are really grasping at straws,
sometimes. A sample message is supplied that is multipart/alternative
with two components (text and HTML). Both are encoded Base64 to hide the
fact that they are spam. Anybody who encodes a text/plain segment is
obviously up to no good so it must be an act of desperation to make such
an obvious move (spam_mixed_encoded).
- spam_html_masked_comment - With the
advent of Bayesian classifiers that look for particular words in messages,
to see if they contain spam, the spammers have taken to obfuscating words
by sending the message as HTML and inserting comments into the middle of
the words. HTML renders the text correctly but a scanner may miss the key
words. A good filter will remove such comments, examples of which are
demonstrated by this test message.
- spam_html_masked_escape - Another
technique used by spammers to mask words from Bayesian classifiers is to
represent the characters in the words as HTML escape sequences. This
message tests a filter's ability to convert HTML escapes into regular
characters so that the classifier can see the key words.
- spam_html_straight - Many email clients
will interpret HTML in the body of a message, even though HTML is not
supposed to appear in a simple message body, only as a multipart/alternative
component. Allowing that a message can contain HTML in a simple message
body (it is acutally quite common), despite its not supposed to being
possible, putting spam into such a badly formed message is a good test of
a filter's ability to work outside of the box and think like an email
client. A good spam filter should be able to handle this case.
- spam_html_obscured - In the above
message, the HTML was put directly into the simple message body. With
this message we put the HTML text there too, but also test how your filter
copes with HTML comments inserted into key words, to obscure them from the
- In a couple of variations on a theme, text and HTML spam can appear
as the only component of a MIME-encoded message but also be encoded Base64
so that the spam filter won't see that it is spam. The spam filter should
be able to detect the spam in both cases supplied
- This message contains some rather obvious spam of the penile
enhancement variety that is found in the HTML portion of a
multipart/alternative message. It is the only component, incidentally,
bypassing the rule about multipart/alternative messages containing at
least a text portion (spam_html_viagra).
- Whether through omission or comission, quite a bit of spam arrives
with broken MIME components. The most common error is the omission of the
last MIME segment or segments terminator for all nested MIME levels. Most
mail readers blithly interpret this faux pas so a good spam filter must
accept it too (spam_mime_malformed).
- Another common error is the omission of any valid MIME segments in a
message marked as MIME. Many mail readers will simply interpret this kind
of broken message as text, showing the recipient the Spam. This being the
case a good spam filter must accept such broken messages as text too. In
our example, the message appears to be valid but, in reality, the MIME
separators are broken and the message has no MIME entities at all
spam_text_notfilt2, spam_html_notfilt - Detection of spam is
not always easy. The test suite includes three test cases that stretch
the limits of traditional spam detectors. The content should mark them as
spam but you never know.
- If your mail filter employs a statistical check to find spam quickly
by counting the number of "bad" HTML tags in a message, this test should
exercise it with a multi-part message that contains many suspicious tags
in the HTML component (spam_stats_mixed).
The messages beginning with "virus_" are meant to test the
virus detection component of your mail filter. However, the messages in
the "baddoc" class can also be used to test virus detection since many of
them contain actual examples of real file or inline MIME types.
- A very common email virus is one that includes a virus component as
an attachment and an automatic launcher, in the form of an HTML message,
that opens the virus. When your typical email package encounters it, it
interprets the HTML and launches the virus before the user even has a
chance to do anything about it. A sample of this kind of virus, using a
harmless virus component (in case it slips through), is provided
- Lately, we've been seeing a virus that looks like a mail system return
receipt for a message that was not delivered. I figure this is probably
some clever virus writer's attempt to sneak a virus by if the filters have
some special case checking in place for delivery-status messages and/or a
sender of postmaster. Either that or they are hoping to lull the user
into a false sense of security and convince them to open the message
because it is from a legit source. A sample virus that employs this
technique plus the usual HTML launcher is given
- This message is an example of a virus launcher in the HTML portion
of the message that is obscured by tricks with comments, etc. Perhaps not
all mail readers will be promiscuous enough to accept some of these tricks
but, if they do, it spells big trouble because the virus is launched when
the user opens the message (i.e. before they've had a chance to read it)
- Some mail readers may accept multiple HTML MIME components and
assemble them into a single message when displaying them. If this is the
case, a possible virus exploit would be to split a virus launcher across
two HTML components. A sample of this kind of virus launcher is given
- One method of delivering a virus launcher, that is applicable to
Outlook only, is to include HTML tags in the subject line of a message,
separated by a carriage return. Apparently, Outlook sees the carriage
return and starts interpreting the HTML in the subject as part of the
body. This works to the virus' advantage, since many virus filters don't
examine the subject for malicious HTML tags. Because it is perfectly
feasable to include a launcher in the subject that launches a viral
attachment, an example of this kind of virus is given
- Since spam is becoming ubiquitous, I guess some clever virus writer
figured to take advantage of this fact. I recently received a piece of
pornographic spam that featured an executable attachment that purported to
show pornographic photos. Click on the attachment and, "You've got
disease!", to paraphrase AOL. You've got to admit, its a brilliant
scheme. Masquerade as an obnoxious message to hide the fact that you're
really a malicious one. Also, legitimate spam filtering must look for
viruses as well so this tests that case too. You'll love this example
- To test whether your email filter or the virus scanner that it
invokes can find an actual virus, this test message includes the sample
virus distributed with Clam AV as several attachments. Your scanner
should find and delete both of the attachments to this message, or at the
very least render them harmless. (virus_clam_attach).
- Another variation on the theme of the HTML virus launcher plus
attached viral payload is the encoded launcher. Although nearly all
"text/html" MIME type entities are sent as "7Bit" or some other unecoded
form, this is not a requirement. Encoding the launcher as Base64
so that the virus filter will not see it is a pretty typical trick. This
message supplies an example of this kind of encoded virus, using the HTML
version of the Clam AV sample virus. Your scanner should remove the link
and/or delete the message. (virus_clam_encoded).
- To test whether your email filter can detect an actual virus, the
HTML version of the Clam AV sample virus is embedded in the text of this
message as a clickable link. Your scanner should remove the link and/or
delete the message. (virus_clam_html).
How It Works:
We provide you with the suite of test messages and a shell
script that can be used to submit some or all of the messages to your mail
delivery program (e.g. the "mail" command).
Once you have your filter working, send the messages from the
validation suite through your mail system and observe the results. Each
message includes text that describes the test it is meant to perform and the
results that should be expected. Anything that doesn't conform to the
expectd results should be investigated.
You can also use the validation suite as a regression test
after filter code changes have been made.
Specially crafted email: the validation suite uses
hand-made email messages that contain generic varieties of the virus and spam
delivery techniques most commonly used. The messages are designed to exercise
all paths through a typical email filter program.
Safe to use: denatured viruses that cannot hurt
your system, even if they get through, are used in the validation suite.
They are much safer than using real, live viruses (but, if you'd like to
live dangerously, go ahead -- you certainly can use live viruses to test
your filter, just don't complain when you find out that we told you so).