Enhancing Collaborative Spam Detection with Bloom Filters



Signature-based collaborative spam detection (SCSD) systems provide a promising solution addressing many problems facing statistical spam filters, the most widely adopted technology for detecting junk emails. In particular, some SCSD systems can identify previously unseen spam messages as such, although intuitively this would appear to be impossible. However, the SCSD approach usually relies on huge databases of email signatures, demanding lots of resource in signature lookup, storage, transmission and merging. In this paper, we report our enhancements to two representative SCSD systems. In our enhancements, signature lookups can be performed in constant time, independent of the number of signatures in the database. Space-efficient representation can significantly reduce signature database size. A simple but fast algorithm for merging different signature databases is also supported. We use the Bloom filter technique and a novel variant of this technique to achieve all this.

Spam (junk bulk email) is an ever-increasing problem. It causes annoyance to individual email users but also imposes significant costs on many organisations. To date, statistical spam filters are probably the most heavily studied and the most widely adopted technology for detecting junk emails. However, among other disadvantages, these filters need to be regularly trained, particularly when the filters result in excessive numbers of “false positive” or “false negative” decisions. In particular, such systems fail to detect spam that cannot be predicted by the machine learning algorithms on which they are based. Such filters also cannot identify spam that is sent as an image attachment to an otherwise unobjectionable email message. In addition, as content-based filters, they are language-dependent (e.g. a filter trained for English is useless in detecting spam in Chinese, and vice versa) and vulnerable to various content-manipulation attacks (e.g. so-called “filter poisoning”). Signature-based Collaborative Spam Detection (SCSD) systems, e.g. Razor [7] and Distributed Checksum Clearinghouse (DCC) [3], are an attractive complement to statistical spam filters. As an alternative approach, these systems provide a promising solution addressing all the above problems facing statistical filters. In particular, systems like DCC can identify previously unseen spam messages as such, although intuitively this would appear to be impossible

Click here for free

download this paper


CSE PROJECTS

FREE IEEE PAPER AND PROJECTS

FREE IEEE PAPER