3rd Place Solution for Google Universal Image Embedding
Abstract
This paper presents the 3rd place solution to the Google Universal Image Embedding Competition on Kaggle. We use ViT-H/14 from OpenCLIP for the backbone of ArcFace, and trained in 2 stage. 1st stage is done with freezed backbone, and 2nd stage is whole model training. We achieve 0.692 mean Precision @5 on private leaderboard. Code available at https://github.com/YasumasaNamba/google-universal-image-embedding
0
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.