Classification of Data Stream with Skewed Distribution
Author(s):
Pooja , Mata Raj Kaur Institute of engineering & technology Rewari
Keywords:
Data, Data Mining, Information and Knowledge
Abstract:
The emerging domain of data stream mining is one of the important areas of research for the data mining community. The data streams in various real life applications are characterized by concept drift. Such data streams may also be characterized by skewed or imbalance class distributions for example financial fraud detection, Network intrusion detection etc. In such cases skewed class distribution of the stream increases the problems associated with classifying stream instances. Learning from such skewed data streams results in a classifier which is biased towards the majority class. Thus the model or the classifier built on such skewed data streams tends to misclassify the minority class examples. In case of some applications for instance, financial fraud detection the identification of fraud-ulent transaction is the main focus because here misclassification of such minority class instances might result in heavy losses, in this case financial. Increasingly higher losses due to misclassification of such minority class instances cannot be ruled out in many other data stream applications as well. The challenge, therefore, is to pro-actively identify such minority class instances in order to avoid the losses associated with the same. With an effort in this direction we propose a method using k nearest neighbours approach and oversampling technique to classify such skewed data streams. Oversampling is achieved by making use of minority class examples which are retained from the stream as the time progresses. Experimental results show that our approach shows good classification performance on synthetic as well as real world datasets.
Other Details:
Manuscript Id | : | IJSTEV3I1107
|
Published in | : | Volume : 3, Issue : 1
|
Publication Date | : | 01/08/2016
|
Page(s) | : | 201-205
|
Download Article