UsERQA: An LLM-Driven User-Aware Community Question Answering System
Subject Areas : AI and Robotics
Seyyede Zahra Aftabi
1
,
Saeed Farzi
2
*
1 - Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
2 - Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
Keywords: Community question answering systems, Answer generation, Recognizing question entailment, User modeling, Query-focused multi-document summarization,
Abstract :
In the present era, question-and-answer communities have become vibrant platforms for sharing knowledge. Every year, millions of questions are posted on these forums with the hope of receiving answers from human experts. Nonetheless, many of these questions fail to receive timely or accurate answers due to experts' limited time or being duplicates. In recent years, a large body of research has focused on identifying entailed questions within community archives and using their accepted answers to fulfill the information needs of newly posed questions. Most of these studies match questions syntactically and semantically while resorting to external knowledge injection or increased model complexity to enhance question understanding. However, the critical role that the topics typically explored by questioners play in disambiguating their queries has been overlooked. This research addresses this gap by introducing UsERQA, a novel retrieval-augmented generation (RAG)-based question-answering system incorporating user knowledge. UsERQA utilizes large language models to represent the questioner's knowledge as a sequence of topical tags. In addition, it employs a question entailment recognition process as a post-retrieval strategy, with a new constraint, mandating the alignment between entailed questions and the questioner's knowledge. Afterward, another large language model generates the final answer using the accepted answers of top entailed questions as context. The goal is to imitate human writing patterns and leverage the knowledge contained in human responses to produce high-quality answers. Experimental results on the CQAD-ReQuEST dataset indicate the efficiency of UsERQA in modeling user knowledge and producing more accurate responses than its user-agnostic counterpart.