Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

Abstract

User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of N frames in a UGV as an M-hop path graph Go for M N, where the similarity between two frames within M time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" Go to a 1-hop path graph G, specified by a generalized graph Laplacian matrix L, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue λ(B) of a coefficient matrix B = diag(h) + μ L, where h is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound λ-(B) by choosing h via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…