JohnConover_medium.jpg 
john@email.johncon.com
http://www.johncon.com/john/

Quarantining Malicious Outlook Attachments


Home | John | Connie | Publications | Software | Correspondence | NtropiX | NdustriX | NformatiX | NdeX | Thanks



home.jpg
john.jpg
connie.jpg
publications.jpg
software.jpg
correspondence.jpg
ntropix.jpg
ndustrix.jpg
nformatix.jpg
ndex.jpg
thanks.jpg

The following seventeen line procmail(1) script fragment can be used to quarantine most malicious Microsoft Outlook® attachments. In the fragment, the quarantine account is represented as quarantine@somedomain.com, (and should be changed for compatibility with your system.)

The quarantine account could be the address of an e-mail archive and retrieval system, such as included in the rel distribution at NformatiX, which uses SmartList, (also available from the procmail site,) for the archive functionality. SmartList is configurable through procmail scripts, for flexibility and extensibility, allowing different strategies to be implemented for handling quarantined messages. For example, accepting messages if they are from a local/corporate domain, and filing, (or bouncing with a delivery refusal notice,) e-mail with attachments from the rest of the Internet-perhaps with user notification that a message has been quarantined, etc.

Description and Walk Through of the Script Fragment

The list of filename extensions that are executable, (or can contain executable code,) can be optimized for speed by building a tree, based on the first letter of the file name extensions, which is defined in a macro for substitution:


        ext='(a(d[ep]|r[cj]|s[dmxp]|u|vi)|\
              b(a[st]|mp|z[0-9]?)|\
              c(an|hm|il|lass|md|om|(p[lp]|\+\+)?|rt|sv)|\
              d(at|e?b|ll|o[ct])|\
              e(ml|ps?|xe)|\
              g(if|z?)|\
              h(lp|t(a|ml?)|(pp|\+\+)?)|\
              i(n[cfis]|sp)|\
              j(ava|pe?g|se?|sp|tmpl)|\
              kbf|\
              l(ha|nk|og|yx)|\
              m(d[abew]|p(e?g|[32])|s[cipt])|\
              ocx|\
              p(a(tch|s)|c[dsx]|df|h(p[0-9]?|tml?)|\
                  if|[lm?]|n[gm]|[po][st]|p?s)|\
              r(a[mr]|eg|pm|tf)|\
              s(c[rt]|h([bs]|tml?)|lp|ql|ys)?|\
              t(ar|ex|gz|iff?|xt)|\
              u(pd|rl|x)|\
              vb[es]?|\
              w(av|m[szd]|p(d|[0-9]?)|s[cfh])|\
              x(al|[pb]m|l[stw])|\
              z(ip|oo)\
             )'

        

and includes the following file name extensions:

.ade .adp .amp .arc .arj .asd .asm .asp .asx .au .avi .bas .bat .bz .bz0 .bz1 .bz2 .bz3 .bz4 .bz5 .bz6 .bz7 .bz8 .bz9 .c .c++ .can .chm .cil .class .cmd .com .cpl .cpp .csv .crt .dat .db .deb .dll .doc .dot .eml .ep .eps .exe .g .gif .gz .h .h++ .hlp .hpp .hta .htm .html .inc .ini .inp .ins .java .jpeg .jpg .js .jse .jsp .jtmpl .kbf .lha .lnk .log .lyx .mda .mdb .mde .mdw .mpeg2 .mpg2 .mpg3 .msc .msi .msp .mst .ocx .os .ot .pas .patch .pcs .pcx .pdf .phtm .phtml .php .php0 .php1 .php2 .php3 .php4 .php5 .php6 .php7 .php8 .php9 .pcd .pif .pl .plm .png .pnm .pps .ps .pt .ram .rar .reg .rpm .rtf .s .scr .sct .shb .shs .shtm .shtml .slp .sql .sys .tar .tex .tgz .tif .tiff .txt .upd .url .ux .vb .vbe .vbs .wav .wmd .wms .wmz .wp .wp0 .wp1 .wp2 .wp3 .wp4 .wp5 .wp6 .wp7 .wp8 .wp9 .wpd .wsc .wsf .wsh .xal .xbm .xls .xlt .xlw .xpm .zip .zoo

Because RFC 1521 allows the syntax of the filename tokens and the file's name to be separated by white space, including a newline, tabs, or spaces, a procmail definition of white space that spans lines is necessary:


        ws = '[  ]*($[   ]+)*'

        

Important: Note that the white space between both sets of square brackets consists of exactly one tab, (hex 09,) followed by exactly one space, (hex 20).

RFC 1521 defines the file's name to be inclosed in a set of double quotation marks, '"', which is inconsistent with the way procmail handles double quotes in conditional statements, requiring a macro definition for substitution:


        dq = '"'

        

End of Line, (used in conditions with variable substitution):


        eol='$'

        

Encrypted, (there is a potential for signatures to carry executable programs, too,) applications, files, and base64 attachments defined in e-mail headers can carry malicious programs, so, the message should be quarantined:


        #
        :0
        * 1^0 $ ^content-type:${ws}(multipart/(mixed|alternative|\
                application|signed|encrypted))|(application/)
        * 1^0 $ ^content-disposition:${ws}attachment;${ws}.*\
                name${ws}=${ws}${dq}.*\.${ext}(\..*)?${dq}${ws}${eol}
        * 1^0 $ ^content-transfer-encoding:${ws}base64
        ! quarantine@somedomain.com

        

The conditional statement:


        #
        :0 BE
        * -3^0
        * 4^0 $ name${ws}=${ws}${dq}.*\.${ext}(\..*)?${dq}${ws}${eol}
        * 4^0 $ begin${ws}[0-9]+${ws}.*\.${ext}(\..*)?${ws}${eol}
        * 4^0 $ ^content-type:${ws}application/
        * 4^0 $ ^content-transfer-encoding:${ws}base64
        * 2^0 [<](!doctype|[sp]?h(tml|ead)|title|body)
        * 2^0 [<](app|bgsound|div|embed|form|i?l(ayer|ink)|img|\
              i?frame(set)?|meta|object|s(cript|tyle))
        * 2^0 =3d
        ! quarantine@somedomain.com

        

operates as follows:

  • The 'B' specifies that the body of the message should be searched, (but only if the preceding statement didn't find anything in the headers; that's what the 'E' means-just in case someone wants to do something else besides bounce to the message to quarantine@somedomain.com in the preceding statement.)

  • The conditionals start with a score of -3. For the message to be considered malicious, and sent to quarantine@somedomain.com, the final score, (the sum of the weighted conditionals, starting with -3,) must be positive. For that to happen, at least one of the "4^0" conditions, or, at least two of the "2^0" conditions, must be true.

  • If the body of the message contains an RFC 1521 content-type and content-disposition boundary, the sequence for the file's name will follow the characters "name", followed by zero or more characters of white space, (possibly including a newline,) followed by an equal sign, followed by possibly more white space, a double quote, and the file name, which may have malicious file name extensions, (which may be concatenated.) If this condition is true, the score is incremented by 4.

  • Likewise, if the body of the message contains a uuencoded file, the sequence for the file's name will follow the characters "begin", followed by zero or more characters of white space, (possibly including a newline,) followed by one or more numerical digits, followed by possibly more white space, and the file name which may have malicious file name extensions, (which may be concatenated.) If this condition is true, the score is incremented by 4.

  • If the message contains an RFC 1521 content-transfer-encoding that is base 64, the body of the message can not be scanned. So to be safe, if this condition is true, the score is incremented by 4.

  • If the message contains an RFC 1521 content-type that is an application, it may be from a multipart/mixed set. So to be safe, if this condition is true, the score is incremented by 4.

  • If the message body contains HTML tags, (<!doctype, <html, <body, etc.,) then it is potentially malicious. If this condition is true, the score is incremented by 2.

  • However, if in addition, the message body contains HTML scripting tags, (<app, <div, <script, etc.,) then it is considered malicious. If this condition is true, the score is incremented by 2.

  • Or, if in addition, the message body contains the "=3d" tag, then it is considered malicious. If this condition is true, the score is incremented by 2.

  • If the sum of the weighted scores of the conditionals that were true, starting with -3, is positive, the message is regarded as malicious, and sent to quarantine@somedomain.com, else it is not.


Extension

The script fragment is compatible with the Stochastic UCE Detection procmail script, which is very effective at reducing the amount of commercial e-mail received by users. Also, Microsoft® executable attachments can be detected in messages by the howto-virus.txt procmail fragment, which is available on the ReceivedIP page.


Addendum

To evaluate the relative execution speed of the regular expression search mechanism used in procmail, a simple search, (using the macro extensions listed above,) of a 10 MB e-mail file using the following procmail construct in a file:


        :0 B:
        * $ name=${dq}.*${ext}${dq}
        { DUMMY=true }
        #
        :0
        /dev/null

        

and using the command "procmail file < e-mail_file", was compared against egrep(1) with the following expression file:


        name=".*\.ade"
        name=".*\.adp"
        name=".*\.amp"
        name=".*\.arc"
        name=".*\.arj"
        name=".*\.asd"
        name=".*\.asm"
        name=".*\.asp"
        name=".*\.asx"
        name=".*\.au"
        name=".*\.avi"
        name=".*\.bas"
        name=".*\.bat"
        name=".*\.bz"
        name=".*\.bz0"
        name=".*\.bz1"
        name=".*\.bz2"
        name=".*\.bz3"
        name=".*\.bz4"
        name=".*\.bz5"
        name=".*\.bz6"
        name=".*\.bz7"
        name=".*\.bz8"
        name=".*\.bz9"
        name=".*\.c"
        name=".*\.c++"
        name=".*\.can"
        name=".*\.chm"
        name=".*\.cil"
        name=".*\.class"
        name=".*\.cmd"
        name=".*\.com"
        name=".*\.cpl"
        name=".*\.cpp"
        name=".*\.crt"
        name=".*\.csv"
        name=".*\.dat"
        name=".*\.db"
        name=".*\.deb"
        name=".*\.dll"
        name=".*\.doc"
        name=".*\.dot"
        name=".*\.eml"
        name=".*\.ep"
        name=".*\.eps"
        name=".*\.exe"
        name=".*\.g"
        name=".*\.gif"
        name=".*\.gz"
        name=".*\.h"
        name=".*\.h++"
        name=".*\.hlp"
        name=".*\.hpp"
        name=".*\.hta"
        name=".*\.htm"
        name=".*\.html"
        name=".*\.inc"
        name=".*\.inf"
        name=".*\.ini"
        name=".*\.isp"
        name=".*\.ins"
        name=".*\.java"
        name=".*\.jpeg"
        name=".*\.jpg"
        name=".*\.js"
        name=".*\.jse"
        name=".*\.jsp"
        name=".*\.jtmpl"
        name=".*\.kbf"
        name=".*\.lha"
        name=".*\.lnk"
        name=".*\.log"
        name=".*\.lyx"
        name=".*\.mda"
        name=".*\.mdb"
        name=".*\.mde"
        name=".*\.mdw"
        name=".*\.mpeg2"
        name=".*\.mpg2"
        name=".*\.mpg3"
        name=".*\.msc"
        name=".*\.msi"
        name=".*\.msp"
        name=".*\.mst"
        name=".*\.ocx"
        name=".*\.os"
        name=".*\.ot"
        name=".*\.pas"
        name=".*\.patch"
        name=".*\.pcd"
        name=".*\.pcs"
        name=".*\.pcx"
        name=".*\.pdf"
        name=".*\.phtm"
        name=".*\.phtml"
        name=".*\.php"
        name=".*\.php0"
        name=".*\.php1"
        name=".*\.php2"
        name=".*\.php3"
        name=".*\.php4"
        name=".*\.php5"
        name=".*\.php6"
        name=".*\.php7"
        name=".*\.php8"
        name=".*\.php9"
        name=".*\.pif"
        name=".*\.pl"
        name=".*\.plm"
        name=".*\.png"
        name=".*\.pnm"
        name=".*\.pps"
        name=".*\.ps"
        name=".*\.pt"
        name=".*\.ram"
        name=".*\.rar"
        name=".*\.reg"
        name=".*\.rpm"
        name=".*\.rtf"
        name=".*\.s"
        name=".*\.scr"
        name=".*\.sct"
        name=".*\.shb"
        name=".*\.shs"
        name=".*\.shtm"
        name=".*\.shtml"
        name=".*\.slp"
        name=".*\.sql"
        name=".*\.sys"
        name=".*\.tar"
        name=".*\.tex"
        name=".*\.tgz"
        name=".*\.tif"
        name=".*\.tiff"
        name=".*\.txt"
        name=".*\.upd"
        name=".*\.url"
        name=".*\.ux"
        name=".*\.vb"
        name=".*\.vbe"
        name=".*\.vbs"
        name=".*\.wav"
        name=".*\.wmd"
        name=".*\.wms"
        name=".*\.wmz"
        name=".*\.wp"
        name=".*\.wp0"
        name=".*\.wp1"
        name=".*\.wp2"
        name=".*\.wp3"
        name=".*\.wp4"
        name=".*\.wp5"
        name=".*\.wp6"
        name=".*\.wp7"
        name=".*\.wp8"
        name=".*\.wp9"
        name=".*\.wpd"
        name=".*\.wsc"
        name=".*\.wsf"
        name=".*\.wsh"
        name=".*\.xal"
        name=".*\.xbm"
        name=".*\.xls"
        name=".*\.xlt"
        name=".*\.xlw"
        name=".*\.xpm"
        name=".*\.zip"
        name=".*\.zoo"

        

using the command "egrep -is -f file e-mail_file > /dev/null".

On a 433 MHz. Pentium class machine, procmail took 0.229 machine seconds of CPU time, and egrep(1) took 0.432 machine seconds.

Removing "name=".*\" from the file, and using fgrep(1), the time required was 0.475 machine seconds.

Note that the regular expression search construction in all three cases was modified to accommodate the egrep(1) and fgrep(1) restrictions that regular expressions can not span lines.


Appendix I

There is an issue with the way Outlook parses MIME headers. Using the BadTrans.B worm as an example, which contains the MIME e-mail header construct:


        MIME-Version: 1.0
        Content-Type: multipart/related;
        type="multipart/alternative";
        boundary="====_ABC1234567890DEF_===="

        

which is a violation of RFC 822, Section 3.1.1, (there is no preceding linear-white-space in the last two records.) A properly constructed e-mail header parser, (for example, the MIME reference code used in metamail(1),) would not consider such a message to have attachments, and, potentially, pass attachments that contain malicious code on to Outlook for execution.

A safe, (and possibly conservative,) alternative is to search the body of the message for the "name" and/or "begin" tags, followed by a file name extension that can contain potentially malicious code.

However, there is a substantial performance impact with the implementation-procmail's regular expression search algorithm requires about one CPU second per MByte of file size on a 466 Pentium class machine to execute the fragment.


Appendix II

As outlined in Microsoft Security Bulletin (MS00-075), there was a potential for an e-mail to contain an HTML link, luring an Outlook user to execute a script containing malicious code on a rogue site. Although there has been a fix for over a year, there have been numerous reports submitted to SecurityFocus' BugTraq® mailing list that the problem has not been resolved, and Internet Explorer®, (which is called from Outlook to render HTML in an e-mail,) is capable of executing malicious code disguised as jpg or gif images-which has been denied by Scott Culp, Security Program Manager, Microsoft Security Response Center in an e-mail to the BugTraq mailing list, 29 July, 2001. A safe, (and possibly conservative,) approach is to search the body of HTML messages for links to images, scripts, etc. The fragment will not quarantine strict HTML 4.0 compliant messages without links. Since the body of the message is searched as per Appendix I, the performance impact is minimal.


Thanks

A special note of appreciation to Stephen R. van den Berg, (AKA BuGless,) the author of procmail, who for nine years developed and supported the procmail program, (the "e-mail system administrator's crescent wrench,") for the Internet community. And, a special thanks to Philip Guenther the current maintainer of procmail, and moderator of the procmail mailing list for providing the search optimization for the procmail "recipe" described above.


License

A license is hereby granted to reproduce this software for personal, non-commercial use.

THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.

So there.

Copyright © 1992-2005, John Conover, All Rights Reserved.

Comments and/or problem reports should be addressed to:

john@email.johncon.com

http://www.johncon.com/john/
http://www.johncon.com/ntropix/
http://www.johncon.com/ndustrix/
http://www.johncon.com/nformatix/
http://www.johncon.com/ndex/



Home | John | Connie | Publications | Software | Correspondence | NtropiX | NdustriX | NformatiX | NdeX | Thanks


Copyright © 1992-2005 John Conover, john@email.johncon.com. All Rights Reserved.
Last modified: Sat Aug 20 02:02:50 PDT 2005 $Id: index.html,v 1.0 2005/08/20 09:03:03 conover Exp $
Valid HTML 4.0!