It is better to arrive than to travel hopefully

 

Have you ever wondered what actually happens to an email when you press send?

The email process is more than 20 years old but is still remarkably efficient at delivering messages, documents and images from A to B in a matter of seconds even if A and B are on opposite sides of the world. It is less error prone and far faster than normal mail and is effectively free! It is only the rise of spam which threatens its incredible efficiency.

When you press send, your email will travel through at least two other computers on the way from your computer, phone or tablet to the receiver's computer, phone or tablet. This is a slightly simplified picture but correct in detail.

If your machine is called A and the device of the person who you are sending to is called B, then your mail might travel A ? M1 ? M2 ? B. M1 and M2 are called mail servers and M1 is the sending mail server and M2 the receiving mail server. They are basically responsible for transferring, retrying if the internet is busy, and storing mail until it is read by B.

If everything is fine, the mail goes A ? M1 ? M2 ? B. B then reads it and might respond so their mail travels B ? M2 ? M1 ? A, where you read the reply. Everything is fine.

The growth of spam has changed all this so that M1 and M2 have to be very picky about what they send and receive. Here are a few of the things that can happen.

So what can go wrong?

A → M1 and M1 can't find the M2. This happens if you spell the domain name of the recipient's email address wrong, (for the techie minded, M1 performs a DNS lookup of the domain and it comes back as unknown). The domain name is the bit after the '@'. Your mail then travels M1 → A and you are back where you started. Its just the same as spelling the address incorrectly on an envelope you send by normal post, except that normal postmen might try a bit harder.

If you do spell the domain name correctly, your email might go a bit further, A → M1 → M2. However, suppose you have got the username wrong, (the bit before the '@'). M2 might have a list of everybody it knows about with this domain name and say, "sorry, nobody here of that name" and hard bounce it. In other words, your lovingly crafted email just bounces off M2 and comes back, M2 → M1 → A. Another possibility is that you spelt it correctly but when M2 tries to deliver it to B, it can't because B's mailbox is full. In this case M2 soft bounces and it just comes back B → ..

Other more exotic things can happen. Mail servers like M1 and M2 can be temporarily black-listed. This means their IP address, (the sets of four numbers like 192.168.2.4) are stored on a black-listing service somewhere on the internet because they have sent spam. There are around 100 such black-listing services of which probably the most famous is the excellent Spamhaus. Suppose M1 has been black-listed. Then your mail travels A → M1 → M2, at which point M2 says to M1, "I'm not accepting email from you because you have been a naughty server" and tells it which black-list it is on. Your email then comes back M2 → M1 → A with an explanation why.

Mail servers try very hard to keep up their server reputation. This can be done by techniques such as SPF (Sender Policy Framework) and DKIM (Domain Keys Identified Mail). If for example M1 publishes both SPF and DKIM information, it tells M2 that M1 is allowed to send this particular message and its content should match a special key sent along with the email. Knowing about this kind of stuff can improve your geek quotient at the expense of normal people avoiding you. You have been warned.

So where does spam filtering come into all this and why is it so important ?

First of all, spam is a HUGE problem with over 90% of all email being spam. On SendForensics servers, the load is over 99.5%, so its not surprising we have got pretty good at dealing with it - a single SendForensics server has peaked at over 150,000 mails a day, nearly all of which are junk. The most important characteristic of a spammer is that they try to appear to be someone else in order to improve their chances of being accepted. This is easy because in an email, you can write what you like in the From: line. It's called spoofing. The From: line is called a mail header. Other examples of mail headers include To:, Subject:, cc: and so on.

So how do mail servers tell when a message is spam ? Well, some servers are so badly set up, they don't bother. Their users will get lots of spam in their inbox. Cleaning out spam by hand is very time-consuming and a massive waste of time. At the other end of the scale, a well set up server with one of the better spam-filtering systems (properly configured), should be able to run enough analyses to adequately shield its users (at time of writing, SendForensics' proprietary statistical computations are able to distinguish spam with less than 4 mistakes per million messages - an industry first). In essence, spam filtering means studying the transactions between M1 and M2, the reputations of M1 and M2 (for example if they have been black-listed) and then the actual content of the email.

If a mail server decides that a particular email message is sufficiently like junk, it may add a short message to the Subject: line such as **** SPAM ****, as the venerable SpamAssassin program does. It is very easy to have a general email message flagged as spam if you are not careful. Suppose then your mail travels A → M1 → M2 → B but M2 decides it's spam and edits it's Subject: line before passing it to B. It's then quite likely that when it arrives on B's machine that Outlook or whatever B is using to read and write mail just puts it in a spam folder. This is why some companies say check your spam folder or ask you to white-list their address. Its because their emails are sufficiently close to spam that they need to, and they are not doing their job properly. White-listing simply bypasses spam filtering checks. No legitimate email should be flagged as spam.

There's another reason why spam filtering is so important. If it was just the usual Viagra advert, or the ridiculous 419 scams written by petty criminals masquerading as the Reverend General Prince Jaffee Joffer III, they wouldn't be much of a threat. Unfortunately, many of them are now actively trying to break into your machine with a variety of toxic payloads. These, and the current crop of sophisticated phishing attacks, can be very dangerous and just make it that more difficult for genuine mailers to operate.

So, as you can see, the heart of legitimate mailing lies in the discriminating ability of forensic engines to distinguish junk from non-junk and help genuine senders understand and accentuate that distinction in their own mail. That way, arrival is much more likely than simply travelling hopefully.