بهبود استخراج جنبه های متن با استفاده از دانش دامنه و گراف کلمات
محورهای موضوعی : فناوری اطلاعات و ارتباطاتمحمدرضا شمس 1 * , احمد براآنی 2 , مهدی هاشمی 3
1 - دانشگاه اصفهان
2 - دانشگاه اصفهان
3 - دانشگاه اصفهان
کلید واژه: متنکاوي, نظرکاوي, بردار کلمات, استخراج جنبه, دانش دامنه, گراف کلمات,
چکیده مقاله :
با گسترش روزافزون علم و فناوري، تحلیل نظرات کاربران و تعیین نحوه نگرش کاربر به موضوعهاي مختلف به یک امر مهم تبدیل شده است. نظرکاوي فرایند استخراج نگرش افراد از روي نظرات نوشته شده است که در سه سطح سند، جمله و جنبه قابل انجام است. در سطح جنبه، نظر افراد در خصوص جنبههاي مختلف یک موضوع بررسي ميشود. مهمترین زیر بخش نظرکاوي جنبهگرا، استخراج جنبه است که موضوع اصلي این پژوهش ميباشد. در بسیاري از روشهاي ارائه شده براي استخراج جنبه، راه حل مورد نظر نیاز به مجموعه یادگیري اولیه و یا منابع زباني وسیع دارند که تهیه چنین دادههایي بسیار زمانبر و پرهزینه است. در این مقاله، رویکردي بدون نظارت براي استخراج جنبه مبتني بر مدل موضوعي و بردار کلمات پیشنهاد ميشود که از ایجاد گراف کلمات براي ادغام اطلاعات معنایي و دانش دامنه استفاده ميکند. نتایج ارزیابيها نشان از این دارد که روش پیشنهادي نه تنها باعث بهبود دقت استخراج جنبه در مقایسه با سایر روشهاي پیشین شده است، بلکه تمامي مراحل به صورت خودکار و بدون دخالت کاربر انجام ميشود و بدلیل عدم وابستگي به منابع زباني، در زبانهاي مختلف قابل اجرا ميباشد.
With the advancement of technology, analyzing and assessing user opinions, as well as determining the user's attitude toward various aspects, have become a challenging and crucial issue. Opinion mining is the process of recognizing people’s attitudes from textual comments at three different levels: document-level, sentence-level, and aspect-level. Aspect-based Opinion mining analyzes people’s viewpoints on various aspects of a subject. The most important subtask of aspect-based opinion mining is aspect extraction, which is addressed in this paper. Most previous methods suggest a solution that requires labeled data or extensive language resources to extract aspects from the corpus, which can be time consuming and costly to prepare. In this paper, we propose an unsupervised approach for aspect extraction that uses topic modeling and the Word2vec technique to integrate semantic information and domain knowledge based on term graph. The evaluation results show that the proposed method not only outperforms previous methods in terms of aspect extraction accuracy, but also automates all steps and thus eliminates the need for user intervention. Furthermore, because it is not reliant on language resources, it can be used in a wide range of languages.
[1] Wang, Rui, Deyu Zhou, Mingmin Jiang, Jiasheng Si, and Yang Yang. "A survey on opinion mining: From stance to product aspect." IEEE Access, no. 7, pp. 41101-41124, 2019.
[2] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, "Current state of text sentiment analysis from opinion to emotion mining," ACM Computing Surveys, vol. 50, no. 2, p. 25, 2017.
[3] M. Tubishat, N. Idris, and M. Abushariah, "Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges," Information Processing & Management, vol. 54, no. 4, pp. 545-563, 2018. [4] A. G. Pablos, M. Cuadros, and G. Rigau, "W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis," Expert Systems with Applications, vol. 91, p. 127-137, 2018. [5] T. A. Rana and Y. Cheah, "Aspect extraction in sentiment analysis: comparative analysis and survey," Artificial Intelligence Review, vol. 46, no. 4, pp. 459-483, 2016. [6] P. P. Tribhuvan, S. G. Bhirud, and R. R. Deshmukh, "Product Features Extraction for Feature Based Opinion Mining using Latent Dirichlet Allocation," International Journal of Computer Science and Engineering, vol. 5, Issue 10, 2017. [7] Ma, Baizhang, Dongsong Zhang, Zhijun Yan, and Taeha Kim. "An LDA and synonym lexicon based approach to product feature extraction from online consumer product reviews." Journal of Electronic Commerce Research, no. 4, p. 304, 2013. [8] Samha, Amani K., Yuefeng Li, and Jinglan Zhang. "Aspect-based opinion extraction from customer reviews." arXiv preprint arXiv:1404.1982, 2014. [9] Konjengbam, Anand, Neelesh Dewangan, Nagendra Kumar, and Manish Singh. "Aspect ontology based review exploration." Electronic Commerce Research and Applications, pp. 62-71, 2018. [10] Lazhar, Farek, and Tlili-Guiassa Yamina. "Mining explicit and implicit opinions from reviews." International Journal of Data Mining, Modelling and Management, no. 1, pp. 75-92, 2016. [11] Oneata, Dan. "Probabilistic latent semantic analysis." In Proceedings of the Fifteenth conference on Uncertainty, pp. 1-7. 1999. [12] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003. [13] Xu, Hua, Fan Zhang, and Wei Wang. "Implicit feature identification in Chinese reviews using explicit topic mining model." Knowledge-Based Systems, vol. 76, pp. 166-175, 2015. [14] Z. Chen and B. Liu, "Mining topics in documents: standing on the shoulders of big data," Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1116-1125: ACM, 2014. [15] Karmaker Santu, Shubhra Kanti, Parikshit Sondhi, and ChengXiang Zhai. "Generative feature language models for mining implicit features from customer reviews." In Proceedings of the 25th ACM international on conference on information and knowledge management, pp. 929-938, 2016. [16] M. Shams and A. Baraani-Dastjerdi, "Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction," Expert Systems with Applications, vol. 80, pp. 136-146, 2017. [17] Bagheri, Ayoub, Mohamad Saraee, and Franciska De Jong. "Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews." Knowledge-Based Systems, vol. 52, pp. 201-213, 2013. [18] C. Zhang, H. Wang, L. Cao, W. Wang, and F. Xu, "A hybrid term–term relations analysis approach for topic detection," Knowledge-Based Systems, vol. 93, pp. 109-120, 2016. [19] H. Sayyadi and L. Raschid, "A graph analytical approach for topic detection," ACM Transactions on Internet Technology (TOIT), vol. 13, no. 2, p. 4, 2013. [20] Chen, Zhiyuan, and Bing Liu. "Topic modeling using topics from many domains, lifelong learning and big data." In International conference on machine learning, pp. 703-711, 2014. [21] Newman, David, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin. "Evaluating topic models for digital libraries." In Proceedings of the 10th annual joint conference on Digital libraries, pp. 215-224. 2010.