DINOv2: Learning Robust Visual Features without Supervision

์ €์ž: Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski | ๋‚ ์งœ: 2023-04-14 | URL: https://arxiv.org/abs/2304.07193 📄 PDF


Essence

Figure 2

Figure 2: Evolution of performance when scaling in parameters. We show performance on eight

์ž๊ธฐ์ง€๋„ํ•™์Šต(self-supervised learning)์„ ๋Œ€๊ทœ๋ชจ ํ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ์™€ 1B ํŒŒ๋ผ๋ฏธํ„ฐ ViT ๋ชจ๋ธ๋กœ ํ•™์Šตํ•˜์—ฌ ํ…์ŠคํŠธ ๊ฐ๋… ์—†์ด๋„ ๋‹ค์–‘ํ•œ ๋น„์ „ ์ž‘์—…์—์„œ ํ†ต์šฉ๋˜๋Š” ๊ณ ๊ธ‰ ์‹œ๊ฐ ํŠน์„ฑ์„ ์ƒ์„ฑํ•˜๋Š” DINOv2 ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Evolution of performance when scaling in parameters. We show performance on eight

How

Figure 3

Figure 3: Overview of our data processing pipeline. Images from curated and uncurated data sources

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: DINOv2๋Š” ์ž๊ธฐ์ง€๋„ํ•™์Šต์œผ๋กœ foundation ๋ชจ๋ธ ์ˆ˜์ค€์˜ ๋ฒ”์šฉ ์‹œ๊ฐ ํŠน์„ฑ์„ ์ƒ์„ฑ ๊ฐ€๋Šฅํ•จ์„ ์ฒด๊ณ„์ ์ธ ๋ฐ์ดํ„ฐ ํ๋ ˆ์ด์…˜๊ณผ ํ™•์žฅ ์ตœ์ ํ™”๋กœ ์ž…์ฆํ•œ ํš๊ธฐ์  ์—ฐ๊ตฌ์ด๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ ๊ฒ€์ฆ๊ณผ ๋ชจ๋ธ ๊ณต๊ฐœ๋กœ ์‹ค์šฉ์  ์˜ํ–ฅ๋ ฅ์ด ๋งค์šฐ ๋†’๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •