Filtering POP3 mailboxes with fdm

Place the following in /usr/local/bin/extractlastip.pl, use the /etc/procmailrc below, and make the perl script executable.

#!/usr/bin/perl

use Regexp::Common qw/net/;

while (<>) {
 /$RE{net}{IPv4}{dec}{-keep}/ and $last = $1 unless /127\.0\.0\.1|(10\.\d+|172\.(1[6-9]|2\d|3[0-1])|192\.168)(\.\d+){2}/;
}

print $last, "\n";
# Global procmail definitions
# Define a file for procmail to send it's log information.
LOGFILE=${HOME}/procmail.log
# Make sure procmail verbose logging is turned off.
VERBOSE=off
# Define a new line character for use in procmail LOG entries.
# note: the quote spanning two lines below is deliberate.
NL="
"
# Directory where we will store mail folders
# Note: This directory MUST exist!
MAILDIR=${HOME}/Maildir
#Mail folder for incoming whitelisted listmail
LISTMAILFOLDER=${MAILDIR}/listmail
# Location of formail on our system. (for use in procmail actions since
# those typically need a shell meta pattern in procmail action lines to work as intended)
FORMAIL=/usr/bin/formail
# Location of file containing From: addresses of people we correspond with on a regular basis
NOBOUNCE=${HOME}/.nobounce
#Location of a folder containing blacklisted email.
SPAMFOLDER=${MAILDIR}/spam
# Uncomment this if you would rather just delete the blacklisted email.
#SPAMFOLDER=/dev/null
# Location of a file containing regular expressions of patterns that we don't want.
# to see in the Subject: From: or Reply-to: headers
BLACKLIST_PATTERNS=${HOME}/.blacklist_regexp
# Location of file containing To: addresses we have given to news letters
# or web sites that map to my real account via sendmail aliases.
SUBAUTH=${HOME}/.authorized-subscription-aliases
# Capture the message ID string (if any) for future reference in log entries.
:0
* ^Message-ID:
{ MESSAGEID=`${FORMAIL} -cx "Message-ID:" |sed -e 's/[ \t]\{1,\}//g'` }
:0 E
{ MESSAGEID='none' }
# Sample procmail recipe to enumerate the Received: headers, and store them
# in the ${RECEIVEDHEAD} variable. Note the backtics that launch an embedded
# shell script.
:0 W
* H ?? 1^1 ^Received:
{
RECEIVEDCOUNT=$=
RECEIVEDHEAD=`${FORMAIL} -X "X-Originating-IP" -X "Received" | /usr/local/bin/extractlastip.pl`
LOG="[$$]$_: RECEIVEDHEAD=${RECEIVEDHEAD}${NL}"
LOG="[$$]$_: RECEIVEDCOUNT=${RECEIVEDCOUNT}${NL}"
}
# Sample procmail recipe which will extract the IPv4 address from the first
# Received: header. This could be adapted if you have several internal
# servers through which the mail passes.
# Also, the header IP extraction in this recipe is assuming that the header line was
# generated by sendmail. If you are using another server, you may need to adjust
# the regular expression to accommodate that.
# Initialize the SOURCEIP variable
SOURCEIP='000.000.000.000'
:0
* RECEIVEDHEAD ?? [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
{
SOURCEIP=${RECEIVEDHEAD}
LOG="[$$]$_: Extracted IP ${SOURCEIP} from Received: headers.${NL}"
}
:0 E
{ LOG="[$$]$_: Failed to find any source IP in the first Received: header.${NL}" }
# Sample procmail recipe which will generate the reverse IPv4 from
# the SOURCEIP, for use in blocklist lookups.
# It will also verify that the number we are looking at is a real Internet
# address.
# Initialize the SOURCEIPREV variable
SOURCEIPREV='000.000.000.000'
# Check for valid IPv4 address range.
# Then if the address is not an IANA non-routable address
# generate the reverse IP for use in subsequent DNS lookups.
# Build a procmail style regular expression to test for a valid IPv4 range.
OCTET='([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
IPV4RANGECHECK="(${OCTET}\.${OCTET}\.${OCTET}\.${OCTET})"
# Build a procmail style regular expression to test for IPv4 ranges that should not be used on the Internet.
# These are based on RFC-3330 Para 3 summary table.
# Note: These expressions should be periodically verified and updated as needed
CLASSA="((0|10|39|127|2(4[0-9]|5[0-5]))\.${OCTET}\.${OCTET}\.${OCTET})"
CLASSB="((169\.254|128\.0|172\.(1[6-9]|2[0-9]|3[0-1])|191\.255|192\.168|198\.1[8-9])\.${OCTET}\.${OCTET})"
CLASSC="((192\.0\.[02]|223\.255\.255)\.${OCTET})"
# Combine the above into one regular expression.
# Note: IP 255.255.255.255 is included in network 240.0.0.0/4 defined above
#       as part of the CLASSA regular expression variable.
RFC_3330_INVALID="(${CLASSA}|${CLASSB}|${CLASSC})"
:0
* ! SOURCEIP ?? ^(000\.000\.000\.000)$
* $ SOURCEIP ?? ^${IPV4RANGECHECK}$
{
:0
* $ ! SOURCEIP ?? ^${RFC_3330_INVALID}$
{
:0
* SOURCEIP ?? ^[0-9]+\.[0-9]+\.[0-9]+\.\/[0-9]+
{ QUAD4=${MATCH} }
:0
* SOURCEIP ?? ^[0-9]+\.[0-9]+\.\/[0-9]+
{ QUAD3=${MATCH} }
:0
* SOURCEIP ?? ^[0-9]+\.\/[0-9]+
{ QUAD2=${MATCH} }
:0
* SOURCEIP ?? ^\/[0-9]+
{ QUAD1=${MATCH} }
SOURCEIPREV="${QUAD4}.${QUAD3}.${QUAD2}.${QUAD1}"
LOG="[$$]$_: IP ${SOURCEIP} is a valid IPv4 address${NL}"
IPV4VALID=yes
}
:0 E
{
LOG="[$$]$_: IP ${SOURCEIP} is an IANA Non-Routable IPv4 address${NL}"
IPV4VALID=no
}
}
:0 E
{
LOG="[$$]$_: Error - ${SOURCEIP} has an invalid range for an IPv4 address.${NL}"
IPV4VALID=no
}
# Here is another example of a more complex blocklist lookup technique
# which will lookup an IP on zen.spamhaus.org, decode the response, and
# tag the email.
# References:
# http://www.spamhaus.org/zen/index.lasso
# http://www.spamhaus.org/faq/answers.lasso?section=DNSBL%20Technical#200
SPAMHAUSLISTED=no
SPAMHAUSLOOKUP=`host ${SOURCEIPREV}.zen.spamhaus.org`
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.([2-9]|1[01])$
{
# 127.0.0.2 SBL Spamhaus Maintained
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.2$
{ SPAMHAUSLOG="SBL, " }
# 127.0.0.3 --- reserved for future use
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.3$
{ SPAMHAUSLOG="${SPAMHAUSLOG}127.0.0.3, " }
# 127.0.0.4 XBL CBL Detected Address
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.4$
{ SPAMHAUSLOG="${SPAMHAUSLOG}CBL, " }
# 127.0.0.5 XBL NJABL Proxies (customized)
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.5$
{ SPAMHAUSLOG="${SPAMHAUSLOG}NJABL Proxies, " }
# 127.0.0.6 XBL reserved for future use
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.6$
{ SPAMHAUSLOG="${SPAMHAUSLOG}127.0.0.6, " }
# 127.0.0.7 XBL reserved for future use
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.7$
{ SPAMHAUSLOG="${SPAMHAUSLOG}127.0.0.7, " }
# 127.0.0.8 XBL reserved for future use
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.8$
{ SPAMHAUSLOG="${SPAMHAUSLOG}127.0.0.8, " }
# 127.0.0.9 --- reserved for future use
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.9$
{ SPAMHAUSLOG="${SPAMHAUSLOG}127.0.0.9, " }
# 127.0.0.10 PBL ISP Maintained
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.10$
{ SPAMHAUSLOG="${SPAMHAUSLOG}PBL-ISP Maintained, " }
# 127.0.0.11 PBL Spamhaus Maintained
:0
* SPAMHAUSLOOKUP ?? 127\.0\.0\.11$
{ SPAMHAUSLOG="${SPAMHAUSLOG}PBL-SpamHaus Maintained, " }
SPAMHAUSLOG=`echo "${SPAMHAUSLOG}" |sed -e "s/, $/\n\tSee: http:\/\/www\.spamhaus\.org\/query\/bl\?ip=${SOURCEIP}/"`
LOG="[$$]$_: Result codes: ${SPAMHAUSLOG}${NL}"
:0 f
|${FORMAIL} -A "X-blocklists: ${SOURCEIP} found in SpamHaus. Blocklist lookup results: ${SPAMHAUSLOG}"
SPAMHAUSLISTED=yes
}
:0
|

Obviously this is open to abuse because the spammer could add additional received headers to throw the recipe off the scent, however I can’t see that happening unless this becomes a popular way of filtering. Also if there are any non RFC compliant SMTP servers between the spammer and you they may mess with the headers which could screw the whole thing up also. You can’t please all the people all the time.

Update 09/09/10 – While this is an interesting and educational excercise it’s important to point out that the protection afforded by default in Ubuntu by using Amavisd-new and Spamassassin will automatically do all this and more. Spamassassin checks zen.spamhaus.org by default and additionally scans every ip in the received headers against this black list (and many more) except for “trusted” and reserved addresses. By default in Ubuntu, Spamassassin will assume every IP in the headers except your own is un-trusted.

Leave a Reply