์ ์: Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, , , Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma, Sarath Chandar, Franziska Meier, Yann LeCun, Michael Rabbat, Nicolas Ballas | ๋ ์ง: 2025-06-11 | URL: https://arxiv.org/abs/2506.09985 📄 PDF
Figure 1 V-JEPA 2 Overview. Leveraging 1M hours of internet-scale video and 1M images, we pretrain the V-JEPA 2
V-JEPA 2๋ 1๋ฐฑ๋ง ์๊ฐ ์ด์์ ์ธํฐ๋ท ๊ท๋ชจ ๋น๋์ค๋ก ์ฌ์ ํ์ตํ ์๊ธฐ์ง๋ํ์ต ๋น๋์ค ๋ชจ๋ธ๋ก, ๋น๋์ค ์ดํดยท์์ธกยท๋ก๋ด ๊ณํ์ ๋ชจ๋ ๊ฐ๋ฅํ๊ฒ ํ๋ค.
Figure 1 V-JEPA 2 Overview. Leveraging 1M hours of internet-scale video and 1M images, we pretrain the V-JEPA 2
Figure 2 Multistage training. (Left) We first pretrain the V-JEPA 2 video encoder on internet-scale image and
์ดํ: V-JEPA 2๋ ์ธํฐ๋ท ๊ท๋ชจ ์๊ธฐ์ง๋ํ์ต๊ณผ ์ต์ํ์ ๋ก๋ด ์ํธ์์ฉ ๋ฐ์ดํฐ๋ฅผ ๊ฒฐํฉํ์ฌ ๋น๋์ค ์ดํด, ์์ธก, ์ค์ ๋ก๋ด ๊ณํ์ ๋ชจ๋ ๋ฌ์ฑํ ํ๊ธฐ์ ์ฐ๊ตฌ๋ก, ์ธ๊ณ ๋ชจ๋ธ ๊ธฐ๋ฐ ์ผ๋ฐ ์์ด์ ํธ ๊ฐ๋ฐ์ ์๋ก์ด ๋ฐฉํฅ์ ์ ์ํ๋ค.