Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

Abstract

In this report, we experimented with several concepts regarding text streams analysis. We tested an implementation of Incremental Sparse TF-IDF (IS-TFIDF) and Incremental Cosine Similarity (ICS) with the use of bipartite graphs. We are using bipartite graphs - one type of node are documents, and the other type of nodes are words - to know what documents are affected with a word arrival at the stream (the neighbors of the word in the graph). Thus, with this information, we leverage optimized algorithms used for graph-based applications. The concept is similar to, for example, the use of hash tables or other computer science concepts used for fast access to information in memory.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…