Any-point Trajectory Modeling for Policy Learning

์ €์ž: Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel | ๋‚ ์งœ: 2023-12-28 | URL: https://arxiv.org/abs/2401.00025 📄 PDF


Essence

Figure 1

Fig. 1: Given a task instruction and the initial positions of any set of points in an image frame, our Any-point Traject

Any-point Trajectory Modeling (ATM)์€ ์•ก์…˜ ๋ผ๋ฒจ์ด ์—†๋Š” ๋น„๋””์˜ค์—์„œ ์ž„์˜์˜ ์ ๋“ค์˜ ๋ฏธ๋ž˜ ๊ถค์ ์„ ์˜ˆ์ธกํ•˜๋„๋ก ์‚ฌ์ „ ํ•™์Šต๋œ ๊ถค์  ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ, ์ตœ์†Œํ•œ์˜ ์•ก์…˜-๋ผ๋ฒจ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ฐ•๊ฑดํ•œ visuomotor ์ •์ฑ… ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4: We compare with state-of-the-art video pre-training methods on language-conditioned manipulation tasks in the

How

Figure 2

Fig. 2: Overview of our framework. (a) In the first stage, given an action-free video dataset, we first sample 2D points

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ฑ… ํ•™์Šต์— ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์œผ๋กœ, ์ž„์˜์˜ ์  ๊ถค์ ์ด๋ผ๋Š” ๋‹จ์ˆœํ•˜๋ฉด์„œ๋„ ๊ฐ•๋ ฅํ•œ ํ‘œํ˜„์„ ํ†ตํ•ด ๋†’์€ ์„ฑ๋Šฅ๊ณผ ์ผ๋ฐ˜์„ฑ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜๊ณผ ๋ช…ํ™•ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ๋กœ๋ด‡ ํ•™์Šต ๋ถ„์•ผ์— ์˜๋ฏธ ์žˆ๋Š” ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •