Author : Nisha K. Lagad 1
Date of Publication :14th February 2018
Abstract: Term and pattern related approaches are used in information filtering. These approaches are used for generating users’ information needs from a large number of documents. A prediction for these techniques is the documents in the collection are all about the same topic. However, in reality, users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation is given to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval. Patterns are always thought to be more discriminative than single terms and words for describing documents. However, the large amount of discovered patterns hinder them from being effectively used in real time applications, therefore the selection of the most discriminative patterns from the number of discovered patterns becomes crucial.
Reference :
-
- X. Wei and W. B. Croft, “LDA-based document models for ad-hoc retrieval,” in proceedings of the 29th annual International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 2006, pp. 178– 185.
- C. Wang and D. M. Blei, “Collaborative topic modeling for recommending scientific articles,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011, pp. 448–456.
- T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999, pp. 50–57.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
- Y. Gao, Y. Xu, and Y. Li, “Pattern-based topic models for information filtering,” in Proceedings of International Conference on Data Mining Workshop SENTIRE, ICDM‟2013. IEEE, 2013.
- Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, “Adapting ranking svm to document retrieval,” in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006, pp. 186–193.
- S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 extension to multiple weighted fields,” in Proceedings of the thirteenth ACM International Conference on Information and Knowledge Management. ACM, 2004, pp. 42–49.