Correlation between the Topic and Documents Based on the Pachinko Allocation Model
Author(s):
Dr.C.Sundar , Christian College of Engineering and Technology Dindigul, Tamilnadu-624619, India; V.Sujitha, Christian College of Engineering and Technology Dindigul, Tamilnadu-624619, India
Keywords:
Topic model, information filtering, maximum matched pattern, correlation, hidden topics, PAM
Abstract:
Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. In existing system, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is used.Each topic is represented by patterns. The patterns are generated from topic models and are organized in terms of their statistical and taxonomic features and the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant documents.The Maximum matched pat-terns, which are the largest patterns in each equivalence class that exist in the received documents, are used to calculate the relevance words to represent topics. However, LDA does not capture correlations between topics and these not find the hidden topics in the document. To deal with the above problem the pachinko allocation model (PAM) is proposed. Topic models are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. The algorithm improves upon earlier topic models such as LDA by modeling correlations between topics in addition to the word correlations which constitute topics. In this method the most accurate topics are given to that document. PAM provides more flexibility and greater expressive power than latent Dirichlet allocation.
Other Details:
Manuscript Id | : | IJSTEV2I10026
|
Published in | : | Volume : 2, Issue : 10
|
Publication Date | : | 01/05/2016
|
Page(s) | : | 79-84
|
Download Article