A Channel Coding Perspective of Collaborative Filtering
Abstract
We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the clusters over which the underlying matrix is constant are unknown. We establish a sharp threshold result for this model: if the largest cluster size is smaller than C1 (mn) (where the rating matrix is of size m × n), then the underlying matrix cannot be recovered with any estimator, but if the smallest cluster size is larger than C2 (mn), then we show a polynomial time estimator with diminishing probability of error. In the case of uniform cluster size, not only the order of the threshold, but also the constant is identified.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.