Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

Abstract

This article analyses and evaluates FDDeta, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDDeta weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. The article makes the following contributions: (1) it presents an extensive analysis of the behavior of FDDeta as a function of its adjustable parameter; (2) it compares FDDeta against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it introduces a new public data set with news labeled as relevant or irrelevant to the economic domain. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDDeta is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDDeta offers a useful mechanism to explore different approaches to build complex queries.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…