Bandhu Bhandari, Jagat. Feature selection using data mining technique for spam detection . Master's Degree(Information Technology). King Mongkut's University of Technology North Bangkok. Central Library. : King Mongkut's University of Technology North Bangkok, 2008.
Feature selection using data mining technique for spam detection
Abstract:
The numbers of spam email have been growing with the increase in online
business, communication and social networking. The recipient has to bear the cost of
processing unwanted emails. These also consume bandwidth; possess security threats
and privacy concerns. Network based solutions and service based solutions have not
been enough, so content analysis based filtering has become popular to detect spam
email. However, feature selection is crucial step in the text mining domain. High
dimensionality of the features becomes a major challenge in the pattern recognition
and learning algorithms. The discriminative feature which correctly represents the
email document plays the crucial role to curve out the spam documents. Feature
Selection (FS) methods help to identify, rank, and gather the relevant features for a
large number of raw features. This paper discusses the customized local feature search
strategy using Ant Colony System (ACS). It is mostly applicable and cost effective
for feature selection in case of email generating a large number of features. The
classification results after the feature selection shows that it has performed well and
have low false positive rate. However, the accuracy and precision is lower compared
to Entropy, Variance and Chi-square feature selection technique.