Hyper-Fisher Vectors for Action Recognition
Abstract
In this paper, a novel encoding scheme combining Fisher vector and bag-of-words encodings has been proposed for recognizing action in videos. The proposed Hyper-Fisher vector encoding is sum of local Fisher vectors which are computed based on the traditional Bag-of-Words (BoW) encoding. Thus, the proposed encoding is simple and yet an effective representation over the traditional Fisher Vector encoding. By extensive evaluation on challenging action recognition datasets, viz., Youtube, Olympic Sports, UCF50 and HMDB51, we show that the proposed Hyper-Fisher Vector encoding improves the recognition performance by around 2-3% compared to the improved Fisher Vector encoding. We also perform experiments to show that the performance of the Hyper-Fisher Vector is robust to the dictionary size of the BoW encoding.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.