PaLM-E: An Embodied Multimodal Language Model

์ €์ž: Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence | ๋‚ ์งœ: 2023-03-06 | URL: https://arxiv.org/abs/2303.03378 📄 PDF


Essence

Figure 1

Figure 1: PaLM-E is a single general-purpose multimodal language model for embodied reasoning tasks, visual-language tas

PaLM-E๋Š” ์‹œ๊ฐ, ์ƒํƒœ ์ถ”์ •, ํ…์ŠคํŠธ ์ž…๋ ฅ์„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฌธ์žฅ์œผ๋กœ ์ธํ„ฐ๋ฆฌ๋น™ํ•˜์—ฌ LLM์— ์ง์ ‘ ํ†ตํ•ฉํ•˜๋Š” embodied multimodal language model์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋กœ๋ด‡ ์กฐ์ž‘ ๊ณ„ํš, VQA, ์บก์…”๋‹ ๋“ฑ ๋‹ค์–‘ํ•œ embodied reasoning ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: PaLM-E-562B can do zero-shot multimodal chain-of-thought reasoning, can tell visually-conditioned jokes given

How

Figure 1

Figure 1: PaLM-E is a single general-purpose multimodal language model for embodied reasoning tasks, visual-language tas

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: PaLM-E๋Š” LLM์„ ์‹ค์ œ ๋กœ๋ด‡ ์ œ์–ด์— ์ฒ˜์Œ์œผ๋กœ ์˜๋ฏธ์žˆ๊ฒŒ ์ ์šฉํ•œ ํš๊ธฐ์  ์—ฐ๊ตฌ๋กœ, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ž…๋ ฅ์˜ end-to-end ์ฒ˜๋ฆฌ์™€ ๋‹ค์ค‘ ๋„๋ฉ”์ธ ์–‘์„ฑ ์ด์ „์„ ํ†ตํ•ด embodied AI ๋ถ„์•ผ์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•œ๋‹ค. 562B ๊ทœ๋ชจ์˜ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ๊ตฌ์ถ•๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ, ๋‹ค์–‘ํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์˜ ์ž…์ฆ์€ ๋งค์šฐ ์ธ์ƒ์ ์ด๋ฉฐ, ๋กœ๋ด‡๊ณตํ•™๊ณผ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ ๋ถ„์•ผ์— ์ƒ๋‹นํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •