Data Pre-processing in Spam Detection
Author(s):
Manisha , Banasthali University; Anjali Sharma, Banasthali University; Dr. Manisha, Banasthali University; Rekha Jain, Banasthali University
Keywords:
Spam, Spam detection, Data pre-processing
Abstract:
Nowadays, most of the people have access to the Internet, and digital world has become one of the most important parts of everybody’s life. People not only use the Internet for fun and entertainment, but also for business, banking, stock marketing, searching and so on. Hence, the usage of the Internet is growing rapidly. One of the threats for such technology is spam. Spam is a junk mail/message or unsolicited mail/message. Spam is basically an online communication send to the user without permission. Spam has increased tremendously in the last few years. Today more than 85% of mail /messages received by users are spam. These days, spam is a very serious problem because spamming has become a very profitable business for spammers. Spam email takes on various forms like adult content, selling products or services, job offers etc. Spam costs the sender very little to send but most of the costs are paid by the recipient or the service providers rather than by the sender The cost of spam can also be measured in lost human time, lost server time and loss of valuable mail/messages. In filtering of spam, the data cleaning of the textual information is very critical and important. Main objective of data pre-processing in spam detection is to remove data which do not give useful information about the class of the document. In this paper, the focus is on various pre-processing steps of text data such as noise elimination, stop word removal, and stemming. For stemming, Porter's algorithm has been used. Further, some results, after applying all the data pre-processing steps have been displayed.
Other Details:
Manuscript Id | : | IJSTEV1I11008
|
Published in | : | Volume : 1, Issue : 11
|
Publication Date | : | 01/06/2015
|
Page(s) | : | 33-37
|
Download Article