๐Ÿค– AI

A Survey of Embodied AI: From Simulators to Research Tasks ๋…ผ๋ฌธ ์ •๋ฆฌ - (1)

mxnxeonx 2022. 7. 12. 22:22
728x90
728x90

๋ณธ ๊ธ€์€ ํ•ด๋‹น ๋…ผ๋ฌธ์„ ํ•ด์„ํ•˜์—ฌ ๊ด€๋ จ ์—ฐ๊ตฌ์— ๋„์›€์„ ๋ฐ›๊ธฐ ์œ„ํ•ด ์ž‘์„ฑํ•œ ๊ธ€๋กœ, ์˜ค์—ญ๊ณผ ์˜คํƒˆ์ž ๋“ฑ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ฐœ์ธ์ ์ธ ํ•ด์„์ด ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ๋ณด๋‹ค ์ •ํ™•ํ•œ ์ดํ•ด๋ฅผ ์›ํ•˜์‹œ๋Š” ๊ฒฝ์šฐ ๋ณธ๋ฌธ์„ ์ฐธ์กฐํ•ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

 

 

 

A Survey of Embodied AI: From Simulators to Research Tasks๋Š” Embodied AI ๋ถ„์•ผ์˜ Survey ๋…ผ๋ฌธ์œผ๋กœ, Embodied AI ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋“ค์„ ๋น„๊ตํ•˜๊ณ  ์—ฐ๊ตฌ ๊ณผ์ œ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๋Š” ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ๋‹ค.

 

A Survey of Embodied AI: From Simulators to Research Tasks

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions w

arxiv.org

 

๋…ผ๋ฌธ์„ ์ฝ๊ธฐ์— ์•ž์„œ, Embodied AI์— ๋Œ€ํ•ด ๋„๋ฆฌ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ Embodied AI๋ฅผ ์ดํ•ดํ•œ ๋’ค ๋…ผ๋ฌธ์„ ์ฝ๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

 

Embodied AI๋ž€?

Simulator(3D Environment)์— Agent๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐ€์ง€ Task๋ฅผ ์ˆ˜ํ–‰์‹œ์ผœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์œผ๋กœ, ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋กœ๋ด‡๊ณผ ๊ฐ™์€ ๊ธฐ๊ณ„์— ์ „์ด(Sim2Real)ํ•˜์—ฌ ํŠน์ • Task๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•˜๋Š” ๋ถ„์•ผ๋ฅผ ๋งํ•œ๋‹ค. ํ•œ๋งˆ๋””๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ธ๊ณต์ง€๋Šฅ ๋กœ๋ด‡์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.

CVPR์—์„œ Workshop์ด ์ง„ํ–‰๋˜๋Š” ๋“ฑ, ๊ตญ์ œ์ ์œผ๋กœ ๊ด€์‹ฌ์ด ๋†’์•„์ง€๊ณ  ์žˆ๋Š” Embodied AI๋Š” ๋‹ค์–‘ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ๊ณตํ•œ๋‹ค. ์ด๋ฅผ ์ด์šฉํ•˜๋ฉด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋งŒ์œผ๋กœ ๊ณ ๋„ํ™”๋œ ์ธ๊ณต์ง€๋Šฅ์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹น์—ฐํžˆ ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋กœ๋ด‡์— ๋ณธ ์ธ๊ณต์ง€๋Šฅ์„ ํˆฌ์ž…ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

 

 

Primary Terms

  • Agent: Simulator์˜ ์ฃผ์ฒด๊ฐ€ ๋˜๋Š” ๋Œ€์ƒ(๋กœ๋ด‡)
  • Curation: ๋ฐ์ดํ„ฐ ์„ ๋ณ„, ์—„์„ 
  • AI Framework: Embodied AI Simulator ์ƒ์—์„œ Agent๊ฐ€ ํ™œ๋™ํ•˜๊ธฐ ์œ„ํ•œ ๊ณต๊ฐ„์˜ ์ด์ฒด
  • Task: ์—ฐ๊ตฌ ๊ณผ์ œ
  • Robotics: ๋กœ๋ด‡ ๊ณตํ•™. ๋กœ๋ด‡์„ ์ด์šฉํ•˜์—ฌ ๊ฐœ๋ฐœํ•˜๋Š” ๋ชจ๋“  ๊ฒƒ
  • Sim2Real: Simulation to Real-World. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ˜„์‹ค ์„ธ๊ฒŒ์— ๊ตฌํ˜„ํ•˜๋Š” ์ „์ด ํ•™์Šต
  • Real-World Counterparts: ์‹ค์„ธ๊ณ„์˜ Object๋ฅผ Simulation ์ƒ์— ๊ตฌํ˜„ํ•œ ๊ฒƒ

 


 

Abstract

"Internet AI" ์‹œ๋Œ€์—์„œ "Embodied AI" ์‹œ๋Œ€๋กœ ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์ „ํ™˜๋˜์—ˆ๋‹ค. AI ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ Agent๋Š” ๋” ์ด์ƒ ์ธํ„ฐ๋„ท์—์„œ ํ๋ ˆ์ด์…˜๋œ ์ด๋ฏธ์ง€, ๋น„๋””์˜ค, ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šตํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ์ž๊ธฐ์ค‘์‹ฌ์  ์ž๊ฐ(์ธ์‹)์„ ํ†ตํ•ด ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ํ•™์Šตํ•œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ๊ณผ์ œ์— ์‚ฌ์šฉ(์ง€์›)ํ•˜๊ธฐ ์œ„ํ•ด Embodied AI Simulator์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜์˜€๋‹ค. ์ด๋ ‡๊ฒŒ ๊ด€์‹ฌ์ด ์ฆ๊ฐ€ํ•œ ๊ฒƒ์€ ์ธ๊ณต์ง€๋Šฅ(AGI)์˜ ๋ฐœ์ „์— ๋„์›€์ด ๋˜์ง€๋งŒ, ์ด ๋ถ„์•ผ์— ๋Œ€ํ•œ ํ˜„๋Œ€์ ์ด๊ณ  ํฌ๊ด„์ ์ธ Survey๋Š” ์—†์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์—ฐ๊ตฌ์— ์ด๋ฅด๊ธฐ๊นŒ์ง€์˜ Embodied AI ๋ถ„์•ผ์— ๋Œ€ํ•œ Survey๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ œ์•ˆ๋œ 7๊ฐ€์ง€ ๊ธฐ๋Šฅ์œผ๋กœ 9๊ฐœ์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ํ‰๊ฐ€ํ•จ์œผ๋กœ์จ, Embodied AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๋‚˜์—ดํ•˜๊ณ  ํ•œ๊ณ„๊นŒ์ง€ ์†Œ๊ฐœํ•œ๋‹ค. ๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์€ ์ตœ์ฒจ๋‹จ ์ ‘๊ทผ ๋ฐฉ์‹๊ณผ ํ‰๊ฐ€ ์ง€ํ‘œ, ๋ฐ์ดํ„ฐ์…‹์„ ํฌํ•จํ•˜๋Š” 3๊ฐ€์ง€ ์ฃผ์š” ์—ฐ๊ตฌ ๊ณผ์ œ์ธ Visual Exploration, Visual Navigation, Embodied Question Answering(QA)๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ˜„์žฅ ์กฐ์‚ฌ๋ฅผ ํ†ตํ•ด ๋“œ๋Ÿฌ๋‚œ ์ƒˆ๋กœ์šด ํ†ต์ฐฐ๋ ฅ์œผ๋กœ, Task๋ณ„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ ํƒ์— ๋Œ€ํ•œ ์ œ์•ˆ๊ณผ ํ–ฅํ›„ ๋ฐฉํ–ฅ์— ๋Œ€ํ•œ ๊ถŒ์žฅ ์‚ฌํ•ญ์„ ์ œ๊ณตํ•  ๊ฒƒ์ด๋‹ค.

๋”๋ณด๊ธฐ

"Internet AI" ์‹œ๋Œ€์—์„œ "Embodied AI" ์‹œ๋Œ€๋กœ ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ๋ณ€ํ™” ๋ฐœ์ƒ

  • Internet AI: ์ด๋ฏธ์ง€, ๋น„๋””์˜ค, ํ…์ŠคํŠธ๋กœ ํ•™์Šต(์ „๋ฌธ์ ์œผ๋กœ ๊ฐ๋ณ„๋œ ํฐ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ํ•™์Šต)
  • Embodied AI: ์‹ค์ œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต(์ธ๊ฐ„์ฒ˜๋Ÿผ ์ž๊ธฐ์ค‘์‹ฌ์  ๊ด€์ ์œผ๋กœ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉ)

9๊ฐœ์˜ Embodied AI Simulator๋Š” 7๊ฐœ์˜ 1์ฐจ ํŠน์ง•, 3๊ฐœ์˜ 2์ฐจ ํŠน์ง•์œผ๋กœ ํ‰๊ฐ€๋œ๋‹ค.

Embodied AI์˜ 3๋Œ€ ๊ณผ์ œ๋Š” Visual Exploration, Visual Navigation, Embodied Question Answering(QA)์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐ€์ง€ ๊ณผ์ œ์— ๋Œ€ํ•œ ์ตœ์‹  ์ ‘๊ทผ๋ฒ•๊ณผ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•, ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๋ฉฐ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์„ ํƒ์— ๋„์›€์„ ์ค„ ๊ฒƒ์ด๋‹ค.

 


 

I. Introduction

์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹, ๊ฐ•ํ™” ํ•™์Šต, ์ปดํ“จํ„ฐ ๊ทธ๋ž˜ํ”ฝ ๋ฐ Robotics์˜ ๋ฐœ์ „์œผ๋กœ ๋ฒ”์šฉ AI ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ธํ„ฐ๋„ท์—์„œ ํ๋ ˆ์ด์…˜๋œ ์ด๋ฏธ์ง€, ๋น„๋””์˜ค, ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šตํ•˜๋Š” "Internet AI"์—์„œ ์ธ๊ณต Agent๊ฐ€ ์ฃผ๋ณ€ ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ํ•™์Šตํ•˜๋Š” "Embodied AI"๋กœ ์ „ํ™˜๋˜์—ˆ๋‹ค. Embodied AI๋Š” Agent์™€ ํ™˜๊ฒฝ์˜ ์ƒํ˜ธ์ž‘์šฉ์—์„œ ์ง„์ •ํ•œ ์ง€๋Šฅ์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ฏฟ๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ๊นŒ์ง€์˜ Embodied AI๋Š” ๋น„์ „, ์–ธ์–ด, ์ถ”๋ก  ๋“ฑ ์ „ํ†ต์ ์ธ ์ง€๋Šฅ ๊ฐœ๋…์„ ์ธ๊ณต์ ์ธ ๊ตฌํ˜„์— ์ ‘๋ชฉํ•˜์—ฌ ๊ฐ€์ƒ ํ™˜๊ฒฝ์—์„œ AI ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๋Š” ๊ฒƒ์— ๊ทธ์นœ๋‹ค.

Embodied AI์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋†’์•„์ง€๋ฉด์„œ ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„๋ฅผ ์ถฉ์‹คํžˆ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๊ฐ€ ํฌ๊ฒŒ ๋ฐœ์ „ํ–ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ธ๊ณ„๋Š” AI Framework๋ฅผ ์‹ค์„ธ๊ณ„์— ๋ฐฐํฌํ•˜๊ธฐ ์ „ ํ›ˆ๋ จํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•˜๋Š” ๊ฐ€์ƒ ํ…Œ์ŠคํŠธ ๋ฒ ๋“œ ์—ญํ• ์„ ํ•œ๋‹ค. ๋˜ํ•œ, Embodied AI ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ๊ฐ€์ƒ ์„ธ๊ณ„์™€ ๋™์ผํ•œ ์„ค์ •์„ ๋ณต์ œํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์ˆ˜์ž‘์—…์„ ์ค„์—ฌ ์‹ค์„ธ๊ณ„์—์„œ ์ง€๋ฃจํ•˜๊ฒŒ ์ˆ˜์ง‘ํ•ด์•ผ ํ•˜๋Š” '์ž‘์—… ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ์…‹'์˜ ์ˆ˜์ง‘์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค. Embodied AI ๋ถ„์•ผ์—๋Š” ์—ฌ๋Ÿฌ ๋…ผ๋ฌธ์ด ์กด์žฌํ•˜์˜€์ง€๋งŒ, 2009๋…„ ๊ฒฝ ์‹œ์ž‘๋œ ํ˜„๋Œ€ ๋”ฅ๋Ÿฌ๋‹ ์‹œ๋Œ€ ์ด์ „์— ๋ฐœํ‘œ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€๋ถ€๋ถ„ ๊ตฌ์‹์ด๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์•„๋Š” ํ•œ, Embodied Navigation ํ‰๊ฐ€์— ๊ด€ํ•œ Survey๋Š” ๋ณธ ๋…ผ๋ฌธ ํ•˜๋‚˜ ๋ฟ์ด๋‹ค.

๋”๋ณด๊ธฐ

Embodied AI๋Š” ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ์ง€๋Šฅ์„ ์–ป๋Š”๋‹ค. (ํ•˜์ง€๋งŒ ์•„์ง๊นŒ์ง€ ๊ฐœ๋ฐœ๋œ ๊ฒƒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•จ)

Embodied AI์— ๋Œ€ํ•œ ๊ด€์‹ฌ ์ฆ๊ฐ€๋Š” Simulator์˜ ๋ฐœ์ „์„ ๋ถˆ๋Ÿฌ์™”๋‹ค. ํ˜„์‹ค ์„ธ๊ณ„์— ์ ์šฉํ•˜๊ธฐ ์ „ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” ๊ณผ์ •์„ ํ†ตํ•ด ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Embodied AI์— ๋Œ€ํ•œ Survey ๋…ผ๋ฌธ ๋ถ€์กฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์—ฐ๊ตฌ ๊ณผ์ œ์— ์ด๋ฅด๊ธฐ๊นŒ์ง€์˜ Survey๋ฅผ ์ž‘์„ฑํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ง€๋‚œ 4๋…„ ๋™์•ˆ ๊ฐœ๋ฐœ๋œ 9๊ฐœ์˜ Simulator(DeepMind Lab, AI2-TOR, CHALET, Virtual Home, VRKitchen, Habitat-Sim, iGibson, SAPIEN, ThreeDWorld)๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์ด Simulator๋“ค์€ ๊ฐ•ํ™” ํ•™์Šต Agent๋ฅผ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ๋งŒ ์‚ฌ์šฉ๋˜๋Š” Game Simulator์™€ ๋‹ฌ๋ฆฌ ๋ฒ”์šฉ ์ง€๋Šฅ ์ž‘์—…์„ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ปดํ“จํ„ฐ์ƒ์—์„œ ์‹ค์„ธ๊ณ„์˜ ์‚ฌ์‹ค์ ์ธ ํ‘œํ˜„์„ ์ œ๊ณตํ•˜๋ฉฐ, ์ฃผ๋กœ ํ™˜๊ฒฝ์— ์ผ๋ถ€ ํ˜•ํƒœ์˜ ์ œ์•ฝ์„ ๊ฐ–๋Š” ๋ฐฉ ๋˜๋Š” ์•„ํŒŒํŠธ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„๋‹ค. ๋˜ํ•œ, ํ™˜๊ฒฝ ๋‚ด์—์„œ ์ œ์–ด ๋˜๋Š” ์กฐ์ž‘ ๊ฐ€๋Šฅํ•œ ๋ฌผ๋ฆฌ ์—”์ง„, Python API, ์ธ๊ณต Agent๋ฅผ ์ตœ์†Œ๋กœ ๊ตฌ์„ฑํ•œ๋‹ค.

๋”๋ณด๊ธฐ

์•„๋ž˜ 9๊ฐœ์˜ Simulator๋Š” ๋ฌผ๋ฆฌ ์—”์ง„, Python API, ์ธ๊ณต Agent๋ฅผ ํฌํ•จํ•œ๋‹ค. ๋ฒ”์šฉ ์ง€๋Šฅ ์ž‘์—…์„ ์œ„ํ•ด ์„ค๊ณ„๋œ Simulator๋“ค์€ ์ฃผ๋กœ ์ œ์•ฝ์ด ์žˆ๋Š” ๋ฐฉ ๋˜๋Š” ์•„ํŒŒํŠธ ํ˜•ํƒœ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ์— ์ ํ•ฉํ•˜๋‹ค.

  • DeepMind Lab
  • AI2-THOR
  • CHALET
  • VirtualHome
  • VRKitchen
  • HabitatSim
  • iGibson
  • SAPIEN
  • ThreeDWorld

 

Embodied AI Simulator๋Š” ์ผ๋ จ์˜ ์ž ์žฌ์ ์ด๊ณ  ๊ตฌ์ฒดํ™”๋œ AI ์—ฐ๊ตฌ ๊ณผ์ œ(Visual Exploration, Visual Navigation, Embodied QA)๋ฅผ ๋‚ณ์•˜๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๊ธฐ์กด ๋…ผ๋ฌธ์—์„œ๋„ ์ด๋Ÿฌํ•œ Task์— ์ดˆ์ ์„ ๋งž์ถ”๊ฑฐ๋‚˜ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ์ž‘์—…์— ๋Œ€ํ•œ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ์—์„œ๋„ ์ด ์„ธ ๊ฐ€์ง€ Task์— ์ดˆ์ ์„ ๋งž์ถœ ๊ฒƒ์ด๋‹ค. ์ด๊ฒƒ์€ ๋ณต์žก์„ฑ ์ฆ๊ฐ€์—๋„ ์—ฐ๊ด€๋˜์–ด ์žˆ๋‹ค. Visual Exploration์€ Visual Navigation์—์„œ ๋งค์šฐ ์œ ์šฉํ•œ ๊ตฌ์„ฑ ์š”์†Œ์ด๋ฉฐ ํ˜„์‹ค์ ์ธ ์ƒํ™ฉ์— ์‚ฌ์šฉ๋œ๋‹ค. Embodied QA๋Š” ๋น„์ „ ๋ฐ ์–ธ์–ด ํƒ์ƒ‰์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ๋ณต์žกํ•œ QA ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€๋กœ ํฌํ•จํ•œ๋‹ค. ์–ธ์–ด๋Š” ์ผ๋ฐ˜์ ์ธ ์–‘์‹์ด๊ณ  ์‹œ๊ฐ์  QA๋Š” AI์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์— Embodied QA๋Š” Embodied AI์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐฉํ–ฅ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋…ผ์˜๋œ 3๊ฐ€์ง€ Task๋Š” ์ œ์•ˆ๋œ 9๊ฐœ์˜ Simulator ์ค‘ ํ•˜๋‚˜ ์ด์ƒ์—์„œ ๊ตฌํ˜„๋œ ๊ฒƒ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์˜ Sim2Real๊ณผ Robotics๋Š” ๋‹ค๋ฃจ์ง€ ์•Š๋Š”๋‹ค.

 

Simulator๋“ค์€ CVPR์—์„œ ๋งค๋…„ ์—ด๋ฆฌ๋Š” Embodied AI Workshop์—์„œ ๊ตฌํ˜„์— ์‚ฌ์šฉ๋œ Simulator๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒ๋˜์—ˆ๋‹ค.

์„น์…˜ I์—์„œ๋Š”, ๋ณธ Survey์˜ ๊ฐœ์š” ๊ตฌ์กฐ๋ฅผ ๊ฐ„๋žตํžˆ ์„ค๋ช…ํ•œ๋‹ค. ์„น์…˜ II์—์„œ๋Š”, 9๊ฐœ์˜ Simulator๋ฅผ ๋ฒค์น˜๋งˆํ‚นํ•˜์—ฌ ํ˜„์‹ค์„ฑ, ํ™•์žฅ์„ฑ, ๋Œ€ํ™”์„ฑ ๋ฐ Embodied AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•œ๋‹ค. ์„น์…˜ III์—์„œ๋Š”, Embodied AI์˜ 3๊ฐ€์ง€ Task์ธ Visual Exploration, Visual Navigation, Embodied Question Answering(QA)์„ ์กฐ์‚ฌํ•˜์—ฌ ์ตœ์ฒจ๋‹จ ์ ‘๊ทผ ๋ฐฉ์‹, ํ‰๊ฐ€, ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค๋ฃฌ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์„น์…˜ IV์—์„œ๋Š” Simulator, ๋ฐ์ดํ„ฐ์…‹, ์—ฐ๊ตฌ ๊ณผ์ œ์— ๋Œ€ํ•œ ๊ธฐ์กด ๊ณผ์ œ ๊ฐ„์˜ ์ƒํ˜ธ ์—ฐ๊ฒฐ์„ ํ™•๋ฆฝํ•  ๊ฒƒ์ด๋‹ค.

๋ณธ Survey๋Š” Embodied AI์˜ ์‹ ํฅ ๋ถ„์•ผ๋ฅผ ํฌ๊ด„์ ์œผ๋กœ ์‚ดํŽด๋ณด๊ณ  ํ•ด๋‹น ๋ถ„์•ผ์˜ ์ƒˆ๋กœ์šด ํ†ต์ฐฐ๋ ฅ๊ณผ ๊ณผ์ œ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, AI ์—ฐ๊ตฌ์›๋“ค์ด ๊ด€์‹ฌ ์žˆ๋Š” Task๋ฅผ ์œ„ํ•ด ์ด์ƒ์ ์ธ Simulator๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

๋”๋ณด๊ธฐ

Survey์— ์ œ์‹œ๋œ 9๊ฐœ์˜ Simulator๋Š” CVPR์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒํ•˜์˜€์œผ๋ฉฐ, 3๊ฐ€์ง€ Task์™€ 1์ฐจ/2์ฐจ ํ‰๊ฐ€ ์„ธํŠธ๋ฅผ ํ†ตํ•ด Embodied AI ์—ฐ๊ตฌ ๋ถ„์•ผ์— Simulator๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์ด์ƒ์ ์œผ๋กœ(์•Œ๋งž๊ฒŒ) ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š”๋‹ค.

 


 

II. Simulators for Embodied AI

์ด ์„น์…˜์—์„œ๋Š” Embodied AI Simulator์˜ ๋ฐฐ๊ฒฝ์„ ์ œ์‹œํ•˜๊ณ , ํŠน์ง•์„ ๋น„๊ต ๋ฐ ๋…ผ์˜ํ•œ๋‹ค.

 

A. Embodied AI Simulators

DeepMind Lab, AI2-THOR, SAPIEN, VirtualHome, VRKitchen, ThreeDWorld, CHALET, iGibson, Habitat-Sim 9๊ฐœ์˜ Simulator์˜ ๋ฐฐ๊ฒฝ์„ ์ œ์‹œํ•œ๋‹ค. ๊ฐ Simulator์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ณด์ถฉ ์ž๋ฃŒ๋ฅผ ์ฐธ์กฐํ•˜์ž. ์ด ์„น์…˜์—์„œ๋Š” 7๊ฐ€์ง€ ๊ธฐ์ˆ ์  ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ 9๊ฐœ์˜ Simulator๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๋น„๊ตํ•œ๋‹ค. 7๊ฐ€์ง€ ๊ธฐ์ˆ ์  ํŠน์ง•(Environment, Physics, Object Type, Object Property, Controller, Action, Multi-Agent)์€ Simulator๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ฃผ์š” ํŠน์ง•์ด๋‹ค. ์ด๊ฒƒ์€ ํ™˜๊ฒฝ, ์ƒํ˜ธ ์ž‘์šฉ, ๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์˜ ์ƒํƒœ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋ณต์ œํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํ•„์ˆ˜ ์ธก๋ฉด์„ ๋‹ค๋ฃจ๋ฏ€๋กœ Simulator๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ง€๋Šฅ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•œ ํ…Œ์ŠคํŠธ ๋ฒ ๋“œ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

 

1) Environment

Embodied AI Simulator ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ Game-based์™€ World-based๋กœ ๊ตฌ๋ถ„๋œ๋‹ค.

๊ทธ๋ฆผ 1์„ ์ฐธ์กฐํ•˜๋ฉด, Game-based ์žฅ๋ฉด ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•์€ 3D ์ž์‚ฐ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, World-based ์žฅ๋ฉด ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•์€ ๊ฐ์ฒด์™€ ํ™˜๊ฒฝ์˜ ์‹ค์ œ ์Šค์บ”์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

์™„์ „ 3D ์ž์‚ฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ 3D ํ™˜๊ฒฝ์€ ์‹ค์ œ ์Šค์บ๋‹์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ํ™˜๊ฒฝ์˜ 3D Mesh์™€ ๋น„๊ตํ•  ๋•Œ ์ž˜ ์„ธ๋ถ„ํ™”๋œ ๋‚ด์žฅ ๋ฌผ๋ฆฌ ๊ธฐ๋Šฅ๊ณผ ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ๊ฐ€์ง„๋‹ค. 3D ์ž์‚ฐ์— ๋Œ€ํ•œ ๋ช…ํ™•ํ•œ ๊ฐ์ฒด ๋ถ„ํ• ์„ ํ†ตํ•ด PartNet์—์„œ ์ œ๊ณตํ•˜๋Š” 3D ๋ชจ๋ธ๊ณผ ๊ฐ™์ด ์ด๋™ ๊ฐ€๋Šฅํ•œ Joint๋ฅผ ๊ฐ€์ง„ ๊ด€์ ˆํ˜• ๊ฐ์ฒด๋กœ ์‰ฝ๊ฒŒ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๋‹ค. ๋Œ€์กฐ์ ์œผ๋กœ, ํ™˜๊ฒฝ๊ณผ ๊ฐ์ฒด์˜ ์‹ค์ œ ์Šค์บ”์€ ์‹ค์„ธ๊ณ„์— ๋Œ€ํ•œ ๋” ๋†’์€ ์ถฉ์‹ค๋„์™€ ๋” ์ •ํ™•ํ•œ ํ‘œํ˜„์„ ์ œ๊ณตํ•˜์—ฌ Simulation์—์„œ ์‹ค์„ธ๊ณ„๋กœ Agent ์„ฑ๋Šฅ์„ ๋” ์ž˜ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค. (Habitat-Sim, iGibson์„ ์ œ์™ธํ•œ ๋Œ€๋ถ€๋ถ„์˜ Simulator๊ฐ€ Game-based ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌ์„ฑ๋จ)

 

2) Physics

Simulator๋Š” ์‹ค์ œ ํ™˜๊ฒฝ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹ค์ œ ๋ฌผ๋ฆฌ ํŠน์„ฑ์„ ๋ชจ๋ธ๋งํ•˜๋Š” Agent์™€ ๊ฐ์ฒด, ๊ฐ์ฒด์™€ ๊ฐ์ฒด ๊ฐ„์˜ ํ˜„์‹ค์ ์ธ ์ƒํ˜ธ์ž‘์šฉ๋„ ๊ตฌ์„ฑํ•ด์•ผ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ฌผ๋ฆฌ์  ํŠน์ง•์€ Basic(์ผ๋ฐ˜) physics์™€ Advanced(๊ณ ๊ธ‰) physics๋กœ ๊ตฌ๋ถ„๋œ๋‹ค.

๊ทธ๋ฆผ 2์„ ์ฐธ์กฐํ•˜๋ฉด, Basic feature๋กœ๋Š” collision(์ถฉ๋Œ), rigid-body dynamics(๊ฐ•์ฒด ์—ญํ•™), gravity(์ค‘๋ ฅ) ๋ชจ๋ธ๋ง ๋“ฑ์ด ์žˆ์œผ๋ฉฐ Advanced feature๋กœ๋Š” cloth(์ฒœ), fluid(์œ ์ฒด), soft-body(์—ฐ์ฒด ๋ฌผ๋ฆฌํ•™) ๋“ฑ์ด ์žˆ๋‹ค.

๋Œ€๋ถ€๋ถ„์˜ Embodied AI Simulator๋Š” ๋ฌผ๋ฆฌ ์—”์ง„์ด ๋‚ด์žฅ๋œ Game-based๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Basic physics feature๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ๋‹ค. ๋ฐ˜๋ฉด, ๋ณต์žกํ•œ ๋ฌผ๋ฆฌํ•™ ํ™˜๊ฒฝ์ด ์ธ๊ณต Agent์˜ ๊ฒฐ์ •์„ ์–ด๋–ป๊ฒŒ ํ˜•์„ฑํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š”์ง€๊ฐ€ ๋ชฉํ‘œ์ธ ThreeDworld์™€ ๊ฐ™์€ Simulator์˜ ๊ฒฝ์šฐ, Advanced physics feature๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ๋‹ค.

๋Œ€ํ™”ํ˜• ํƒ์ƒ‰ ๊ธฐ๋ฐ˜ ์ž‘์—…์— ์ดˆ์ฒจ์„ ๋งž์ถ˜ Simulator์˜ ๊ฒฝ์šฐ, ์ผ๋ฐ˜์ ์œผ๋กœ Basic physics feature๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค.

 

3) Object Type

Simulator๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ฐ์ฒด์—๋Š” ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์†Œ์Šค๊ฐ€ ์žˆ๋‹ค.

์ฒซ ๋ฒˆ์งธ Type(์œ ํ˜•)์€ Matterport3D, Gibson๊ณผ ๊ฐ™์€ ๊ธฐ์กด ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ํŒŒ์ƒ๋œ Dataset-driven(๋ฐ์ดํ„ฐ์…‹ ๊ธฐ๋ฐ˜) ํ™˜๊ฒฝ์ด๋‹ค. ๋‘ ๋ฒˆ์งธ Type(์œ ํ˜•)์€ Unity 3D ๊ฒŒ์ž„ ์ž์‚ฐ ์Šคํ† ์–ด์™€ ๊ฐ™์€ Net์—์„œ Object๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” Asset-driven(์ž์‚ฐ ๊ธฐ๋ฐ˜) ํ™˜๊ฒฝ์ด๋‹ค.

๋‘ ์†Œ์Šค์˜ ์ฐจ์ด์ ์€ ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ ์ง€์† ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. Dataset-driven ๊ฐ์ฒด๋Š” ๋ˆ„๊ตฌ๋‚˜ ์˜จ๋ผ์ธ์œผ๋กœ 3D ๊ฐ์ฒด ๋ชจ๋ธ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋Š” Asset-driven ๊ฐ์ฒด๋ณด๋‹ค ์ˆ˜์ง‘ ๋น„์šฉ์ด ๋” ๋งŽ๋‹ค. ํ•˜์ง€๋งŒ, Dataset-driven ๊ฐ์ฒด๋ณด๋‹ค Asset-driven ๊ฐ์ฒด์—์„œ 3D ๊ฐ์ฒด ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์„ ๋ณด์žฅํ•˜๋Š” ๊ฒƒ์ด ๋” ์–ด๋ ต๋‹ค.

๊ฒ€ํ† ์— ๋”ฐ๋ฅด๋ฉด, Game-based ๋ฐฉ์‹์˜ Simulator๋Š” ์ž์‚ฐ ์ €์žฅ์†Œ์—์„œ ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ์–ป๊ธฐ ์‰ฌ์šด ๋ฐ˜๋ฉด, World-based ๋ฐฉ์‹์˜ Simulator๋Š” ๊ธฐ์กด 3D ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ž์›์„ ์–ป๋Š”๋‹ค.

 

4) Object Property

์ผ๋ถ€ Simulator๋Š” ์ถฉ๋Œ๊ณผ ๊ฐ™์€ ๊ธฐ๋ณธ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ฐ€์ง„ ๊ฐ์ฒด๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ณ ๊ธ‰ Simulator๋Š” ๋‹ค์ค‘ ์ƒํƒœ ๋ณ€๊ฒฝ๊ณผ ๊ฐ™์ด ๋” ์„ธ๋ฐ€ํ•œ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ฐ€์ง„ ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ๊ณผ๋ฅผ ์–‡๊ฒŒ ์ฐ์—ˆ์„ ๋•Œ ์‚ฌ๊ณผ ์กฐ๊ฐ์œผ๋กœ ์ƒํƒœ๋ฅผ ๋ฐ”๋€Œ๊ฒŒ ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ, ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์„œ๋กœ ๋‹ค๋ฅธ ์ˆ˜์ค€์˜ ๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ Interact-able(์ƒํ˜ธ์ž‘์šฉ ๊ฐ€๋Šฅ)๊ณผ Multiple-state(๋‹ค์ค‘ ์ƒํƒœ) ๊ฐ์ฒด๋ฅผ ๊ฐ€์ง„ Simulator๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค.

ํ‘œ I๋ฅผ ์ฐธ์กฐํ•˜๋ฉด AI2-THOR, VRKitchen๊ณผ ๊ฐ™์€ ์ผ๋ถ€ Simulator๋Š” Multiple-state๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ ์‹ค์„ธ๊ณ„์—์„œ ๋™์ž‘ํ•  ๋•Œ ๋ฌผ์ฒด๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐ˜์‘ํ•˜๊ณ  ์ƒํƒœ๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ํ”Œ๋žซํผ์„ ์ œ๊ณตํ•œ๋‹ค.

 

5) Controller

๊ทธ๋ฆผ 4๋ฅผ ์ฐธ์กฐํ•˜๋ฉด, ์‚ฌ์šฉ์ž์™€ Simulator ๊ฐ„์˜ Controller ์ธํ„ฐํŽ˜์ด์Šค๋Š” Direct Python API(์ง์ ‘ ํŒŒ์ด์ฌ API ์ปจํŠธ๋กค๋Ÿฌ), Virtual Robot(๊ฐ€์ƒ ๋กœ๋ด‡ ์ปจํŠธ๋กค๋Ÿฌ), Virtual Reality(๊ฐ€์ƒ ํ˜„์‹ค ์ปจํŠธ๋กค๋Ÿฌ)์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ์œ ํ˜•์ด ์กด์žฌํ•œ๋‹ค.

Robotics๋Š” Universal Robot 5(UR5)๋‚˜ TurtleBot V2์™€ ๊ฐ™์€ ๊ธฐ์กด ์‹ค์ œ ๋กœ๋ด‡์˜ ๊ฐ€์ƒ ์ƒํ˜ธ์ž‘์šฉ์„ ํ—ˆ์šฉํ•˜๋ฉฐ, ROS ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง์ ‘ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค.

Virtual Reality Controoler ์ธํ„ฐํŽ˜์ด์Šค๋Š” ๋ณด๋‹ค ๋ชฐ์ž…์ ์ธ HCI(์ธ๊ฐ„๊ณผ ์ปดํ“จํ„ฐ์˜ ์ƒํ˜ธ์ž‘์šฉ)๋ฅผ ์ œ๊ณตํ•˜๊ณ , Real-World Counterparts๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฐ์น˜๋ฅผ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ฃผ๋กœ Visual Navigation์„ ์œ„ํ•ด ์„ค๊ณ„๋œ iGibson๊ณผ AI2-THOR์™€ ๊ฐ™์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋Š” ๊ฐ๊ฐ Castro์™€ RoboTHOR์™€ ๊ฐ™์ด Counterparts๋ฅผ ๋งŒ๋“ค๊ธฐ ์‰ฝ๋„๋ก Viretual Robot Controller๋ฅผ ๊ฐ€์ง„๋‹ค.

* iGibson, AI2-THOR์€ ์žํšŒ์‚ฌ์˜ ๋กœ๋ด‡์— ๋Œ€ํ•œ Virtual Robot Controller๋ฅผ ํƒ‘์žฌํ•œ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ž„

 

6) Action

Embodied AI Simulator์—์„œ ์ธ๊ณต Agent ํ–‰๋™ ๋Šฅ๋ ฅ์˜ ๋ณต์žก์„ฑ์—๋Š” ์ฃผ์š” ํƒ์ƒ‰ ๋Šฅ๋ ฅ(primary navigation)๋งŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋ถ€ํ„ฐ ๊ฐ€์ƒ ํ˜„์‹ค ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ๋” ๋†’์€ ์ˆ˜์ค€์˜ ์ธ๊ฐ„-์ปดํ“จํ„ฐ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ๊นŒ์ง€ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋“ค์„ Navigation, Atomic Action, Human-Computer Interaction์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค.

Navigaton์€ ๊ฐ€์žฅ ๋‚ฎ์€ ๊ณ„์ธต์ด๋ฉฐ ๋ชจ๋“  Embodied AI Simulator์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ๊ฐ€์ง€๋Š” ๊ธฐ๋Šฅ์ด๋‹ค. ๊ฐ€์ƒ ํ™˜๊ฒฝ์„ ํƒ์ƒ‰ํ•˜๋Š” Agent์˜ ๊ธฐ๋Šฅ์— ์˜ํ•ด ์ •์˜๋œ๋‹ค.

Atomic Action์€ ์ธ๊ณต Agent์˜ ๊ด€์‹ฌ ๋Œ€์ƒ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ์ ์ธ ์ด์‚ฐ ์กฐ์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ˆ˜๋‹จ์„ ์ œ๊ณตํ•˜๋ฉฐ ๋Œ€๋ถ€๋ถ„์˜ Simulator์— ํƒ‘์žฌ๋˜์–ด ์žˆ๋‹ค.

Human-Computer Interaction์€ Virtual Reality Controller์˜ ๊ฒฐ๊ณผ๋กœ, ๊ฐ€์ƒ Agent๋ฅผ ์ œ์–ดํ•˜์—ฌ ๊ฐ€์ƒ Agent๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ•™์Šตํ•˜๊ณ  ๊ฐ€์ƒ ์„ธ๊ณ„์™€ ์ƒํ˜ธ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

AI2-THOR, iGibson, HabitatSim๊ณผ ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ํƒ์ƒ‰ ๊ธฐ๋ฐ˜ Simulator์˜ ๋Œ€๋ถ€๋ถ„์€ Navigation, Atomic Action, ROS๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ ์ด๋ฅผ ํ†ตํ•ด ํฌ์ธํŠธ ํƒ์ƒ‰ ๋˜๋Š” ๊ฐ์ฒด ํƒ์ƒ‰๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋™์•ˆ ํ™˜๊ฒฝ ๋‚ด ๋ฌผ์ฒด๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•˜๊ณ  ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ฐ˜๋ฉด, ThreeDWorld, VRKitchen๊ณผ ๊ฐ™์€ Simulator๋Š” ๋งค์šฐ ํ˜„์‹ค์ ์ธ ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ Muliple-state ๋ณ€ํ™”๋ฅผ ์ œ๊ณตํ•˜๋„๋ก ๊ตฌ์„ฑ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— HCI ๋ฒ”์ฃผ์— ์†ํ•œ๋‹ค. ๊ฐ€์ƒ ๊ฐ์ฒด์™€ ์ƒํ˜ธ์ž‘์šฉํ•  ๋•Œ์—๋Š” ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ๋ฏผ์ฒฉ์„ฑ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— HCI์—์„œ๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

7) Multi-Agent

ํ‘œ I๋ฅผ ์ฐธ์กฐํ•˜๋ฉด, ๋‹ค์ค‘ Agent ๊ฐ•ํ™” ํ•™์Šต์„ ํฌํ•จํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ถ€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ(AI2-THOR, iGibson, ThreeDworld)์—์„œ๋งŒ ๋‹ค์ค‘ Agent ์„ค์ •์„ ๊ฐ–์ถ”๊ณ  ์žˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ Simulator๋Š” ์ธ๊ณต Agent์˜ ์ ๋Œ€์ ์ด๊ฑฐ๋‚˜ ํ˜‘์—…ํ•˜๋Š” ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์ค‘ Agent๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์‹ค์งˆ์  ๊ฐ€์น˜๋ฅผ ๊ฐ–๊ธฐ ์ „์— ๊ฐ์ฒด ์ฝ˜ํ…์ธ ๊ฐ€ ํ’๋ถ€ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹ค์ค‘ Agent ์ง€์› Simulator์˜ ๋ถ€์กฑ์œผ๋กœ ์ธํ•ด Embodied AI Simulator์—์„œ ๋‹ค์ค‘ Agent ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜๋Š” ์—ฐ๊ตฌ Task๊ฐ€ ์ค„์–ด๋“ค์—ˆ๋‹ค.

๋‹ค์ค‘ Agent ์„ค์ •์—๋Š” ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ์ธ๊ณต Agent์™€ Simulation ์•„๋ฐ”ํƒ€ ๊ฐ„์˜ ์ƒํ˜ธ ์ž‘์šฉ์„ ํ—ˆ์šฉํ•˜๋Š” ThreeDWorld์˜ Avatar-based ๋ฐฉ์‹์ด๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” AI2-THOR์˜ User-based ๋ฐฉ์‹์œผ๋กœ, 2๊ฐœ์˜ Agent๊ฐ€ ๊ฐ๊ฐ(์ด์ค‘) ํ•™์Šต ๋„คํŠธ์›Œํฌ ์—ญํ• ์„ ๋งก๊ณ  Simulation์—์„œ ๋‹ค๋ฅธ ์ธ๊ณต Agent์™€ ์ƒํ˜ธ ์ž‘์šฉํ•˜์—ฌ ๊ณตํ†ต Task๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๋”๋ณด๊ธฐ

[ Simulator๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” 7๊ฐ€์ง€ 1์ฐจ ํ‰๊ฐ€ ์ง€ํ‘œ ]

1) Environment

  • Game-based: 3D ์ž์‚ฐ์œผ๋กœ ๊ตฌ์„ฑ. ๋ฌผ๋ฆฌ์  ํŠน์ง•๊ณผ ๊ฐ์ฒด ๋ถ„ํ• ์ด ์ž˜ ๊ตฌํ˜„๋˜์–ด ์›€์ง์ž„์ด ์žˆ๋Š” ๋ชจ๋ธ๋ง์— ์ ํ•ฉ
  • World-based: ํ˜„์‹ค ์„ธ๊ณ„์˜ ์Šค์บ”. ํ˜„์‹ค ์„ธ๊ณ„ ๋ฐ˜์˜๋„๊ฐ€ ๋†’์œผ๋ฉฐ Sim2Real์— ์œ ์šฉ. ๋งŽ์€ ์ž์›์ด ํ•„์š”ํ•จ

2) Physics

  • Basic features: ์ถฉ๋Œ, ๊ฐ•์ฒด ์—ญํ•™, ์ค‘๋ ฅ ๋ชจ๋ธ๋ง. ๋Œ€๋ถ€๋ถ„ ์ด ํŠน์ง•์œผ๋กœ ๊ตฌ์„ฑ(Basic์œผ๋กœ ์ถฉ๋ถ„ํ•˜๊ธฐ ๋•Œ๋ฌธ)
  • Advanced features: ์ฒœ, ์œ ์ฒด, ์—ฐ์ฒด ๋ฌผ๋ฆฌํ•™. ๋ณต์žกํ•œ ๋ฌผ๋ฆฌ ํ™˜๊ฒฝ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์‚ฌ์šฉ

3) Object Type

  • Dataset-driven: ๊ฐ์ฒด ์ˆ˜์ง‘์ด ์–ด๋ ต๊ณ  ๋น„์šฉ์ด ๋†’์ง€๋งŒ Quality๊ฐ€ ๋ณด์žฅ๋จ (๊ธฐ์กด 3D ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์…‹)
  • Asset-driven: ๋ˆ„๊ตฌ๋‚˜ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜์—ฌ ์ˆ˜์ง‘์ด ์‰ฝ์ง€๋งŒ Quality๊ฐ€ ๋ณด์žฅ๋˜์ง€ ์•Š์Œ (Asset Store ์ด์šฉ)

4) Object Property

  • Interact-able: ์ถฉ๋Œ๊ณผ ๊ฐ™์€ ๋ฌผ์ฒด ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ๋งŒ ๊ฐ€๋Šฅ
  • Multiple-state: ๋ฌผ์ฒด๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐ˜์‘ํ•˜๊ณ  ์ƒํƒœ๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋Š”์ง€๊นŒ์ง€ ๊ฐ€๋Šฅ

5) Controller

  • Direct Python API
  • Virtual Robot: ROS ์ธํ„ฐํŽ˜์ด์Šค์ฒ˜๋Ÿผ ์‹ค์„ธ๊ณ„์˜ ๋กœ๋ด‡(ํ„ฐํ‹€๋ด‡ ๋“ฑ)๊ณผ ์ƒํ˜ธ์ž‘์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.
  • Virtual Reality: ๋” ๋ชฐ์ž…์ ์ธ HCI๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์‹ค์„ธ๊ณ„ ๋ฌผ๊ฑด๋“ค์„ ๋ฐฐ์น˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

6) Action

  • Navigation: ๊ฐ€์žฅ ํ•˜์œ„ ๋ ˆ๋ฒจ. Embodied AI Simulator์˜ ํŠน์ง•์œผ๋กœ, ๊ฐ€์ƒ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•˜๋Š” ๋Šฅ๋ ฅ
  • Atomic Action: ์ธ๊ณต Agent๊ฐ€ ๊ฐ์ฒด์— ๋Œ€ํ•œ ๊ฐ„๋‹จํ•œ ํ–‰๋™๋“ค์„ ์ˆ˜ํ–‰. ๋งŽ์€ Simulator์—์„œ ์ง€์›
  • Human-Computer Interaction: ์‚ฌ๋žŒ์ด ๊ฐ€์ƒ Agent๊ฐ€ ํ•™์Šตํ•˜๊ณ  ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๊ฒƒ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ œ์–ด

7) Multi-Agent

  • Avatar-based: ์ธ๊ณต Agent์™€ Simulation ์•„๋ฐ”ํƒ€๊ฐ€ ์ƒํ˜ธ์ž‘์šฉ. (ex. ThreeDWorld)
  • User-based: 2๊ฐœ์˜ ํ•™์Šต ๋„คํŠธ์›Œํฌ ์—ญํ• ์ด ์ฃผ์–ด์ง„ ์„œ๋กœ ๋‹ค๋ฅธ ์ธ๊ณต Agent๊ฐ€ ์ƒํ˜ธ์ž‘์šฉ (ex. AI2-THOR)

 

B. Comparison of Embodied AI Simulators

Embodied AI์— ๋Œ€ํ•œ Allen ์ธ๊ณต์ง€๋Šฅ ์—ฐ๊ตฌ์†Œ์˜ ์—ฐ๊ตฌ์™€ 7๊ฐ€์ง€ ํŠน์ง•์„ ๋ฐ”ํƒ•์œผ๋กœ Simulator๋ฅผ ์œ„ํ•œ 2์ฐจ ํ‰๊ฐ€ ์„ธํŠธ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 2์ฐจ ํ‰๊ฐ€ ์„ธํŠธ๋Š” ํ‘œ I์— ํ‘œ์‹œ๋œ ๊ฒƒ์ฒ˜๋Ÿผ Realism(์‚ฌ์‹ค์„ฑ), Scalability(ํ™•์žฅ์„ฑ), Interactivity(์ƒํ˜ธ์ž‘์šฉ์„ฑ) 3๊ฐ€์ง€ ์ฃผ์š” ๊ธฐ๋Šฅ์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

3D ํ™˜๊ฒฝ์˜ Realism์€ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ Environment์™€ Physics์— ๊ธฐ์ธํ•œ๋‹ค. ํ™˜๊ฒฝ์€ ์‹ค์„ธ๊ณ„์˜ ๋ฌผ๋ฆฌ์  ์™ธ๊ด€์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ˜๋ฉด, ๋ฌผ๋ฆฌํ•™์€ ์‹ค์„ธ๊ณ„ ๋‚ด์˜ ๋ณต์žกํ•œ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

3D ํ™˜๊ฒฝ์˜ Scalability๋Š” Object Type์— ๊ธฐ์ธํ•œ๋‹ค. Dataset-based ๊ฐ์ฒด์— ๋Œ€ํ•œ ์‹ค์„ธ๊ณ„์˜ 3D ์Šค์บ”์„ ๋” ์ˆ˜์ง‘ํ•˜๊ฑฐ๋‚˜ Asset-based ๊ฐ์ฒด์— ๋Œ€ํ•œ 3D ์ž์‚ฐ์„ ๋” ๊ตฌ์ž…ํ•˜์—ฌ ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๋‹ค.

Interactivity๋Š” Object Property, Controller, Action, Multi-Agent์— ๊ธฐ์ธํ•œ๋‹ค.

ํ‘œ I์™€ ๊ทธ๋ฆผ 6์˜ 7๊ฐ€์ง€ ์ฃผ์š” ํŠน์ง•์ธ Embodied AI Simulator์˜ 2์ฐจ ํ‰๊ฐ€ ์„ธํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ 3๊ฐ€์ง€ 2์ฐจ ํŠน์ง•์„ ๋ชจ๋‘ ๋ณด์œ ํ•œ Simulator(AI2-THOR, iGibson, Habitat-Sim)๊ฐ€ ๋” ์ข‹์€ ํ‰๊ฐ€๋ฅผ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ Embodied AI Task์— ๋„๋ฆฌ ์‚ฌ์šฉ๋œ๋‹ค.

 

Embodied AI Simulator์— ๋Œ€ํ•ด ํฌ๊ด„์ ์ด๊ณ  ์ •๋Ÿ‰์ ์ธ ๋น„๊ต๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฐ Simulator์˜ Environment Configuration(ํ™˜๊ฒฝ ๊ตฌ์„ฑ)๊ณผ Technical Performance(๊ธฐ์ˆ  ์„ฑ๋Šฅ)์„ ๋น„๊ตํ•œ๋‹ค. ํ™˜๊ฒฝ ๊ตฌ์„ฑ ๊ธฐ๋Šฅ์€ Simulator์˜ ์ œ์ž‘์ž๊ฐ€ ์ œ์•ˆํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€๋ฉฐ, ๊ธฐ์ˆ  ์„ฑ๋Šฅ์€ ์ œ์ž‘์— ์‚ฌ์šฉ๋˜๋Š” Simulation ์—”์ง„์— ํฌ๊ฒŒ ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค.

AI2-THOR์€ ๋‹ค๋ฅธ Simulator์— ๋น„ํ•ด ํ™˜๊ฒฝ ๊ตฌ์„ฑ์ด ๊ฐ€์žฅ ํฌ๋ฉฐ, Habitat-Sim๊ณผ iGibson์€ ๊ทธ๋ž˜ํ”ฝ ๋ Œ๋”๋ง ์„ฑ๋Šฅ์—์„œ ์ƒ์œ„ 2์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค. ํ‘œ II์— ์ œ์‹œ๋œ ์ •๋Ÿ‰์  ์„ฑ๋Šฅ์˜ ๋ฒค์น˜๋งˆํฌ๋Š” ์ด 3๊ฐ€์ง€ Simulator์˜ ์šฐ์ˆ˜์„ฑ๊ณผ ๋ณต์žก์„ฑ์„ ์ถ”๊ฐ€๋กœ ๋ณด์—ฌ์ค€๋‹ค.

 

Embodied AI Simulator์— ๋Œ€ํ•œ ์ด๋Ÿฌํ•œ ๋น„๊ต๋Š” ๋ณธ Survey๊ฐ€ ์—ฐ๊ตฌ Task์— ์ด์ƒ์ ์ธ Simulator๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๊ธฐ ์œ„ํ•ด ํ™•๋ฆฝํ•œ 7๊ฐ€์ง€ 1์ฐจ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ 3๊ฐ€์ง€ 2์ฐจ ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ์ค‘์š”์„ฑ์„ ๋”์šฑ ๊ฐ•ํ™”ํ•˜์˜€๋‹ค.

๋”๋ณด๊ธฐ

[ Simulator๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” 3๊ฐ€์ง€ 2์ฐจ ํ‰๊ฐ€ ์ง€ํ‘œ ]

1) Realism(์‚ฌ์‹ค์„ฑ)

  • Environment: ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋ฌผ๋ฆฌ์  ์™ธ๊ด€์„ ๋ชจ๋ธ๋ง
  • Physics: ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋ณต์žกํ•œ ๋ฌผ๋ฆฌ์  ์„ฑ์งˆ์„ ๋ชจ๋ธ๋ง

2) Scalability(ํ™•์žฅ์„ฑ)

  • Object Type: ์‹ค์„ธ๊ณ„ 3D ์Šค์บ” ๋ฐ์ดํ„ฐ์…‹์„ ๋” ์ˆ˜์ง‘ํ•˜๊ฑฐ๋‚˜ 3D Asset์„ ๊ตฌ์ž…ํ•˜์—ฌ ํ™•์žฅ

3) Interactivity(์ƒํ˜ธ์ž‘์šฉ์„ฑ)

  • Object Property, Controller, Action, Multi-Agent

→ ๋ณธ 3๊ฐ€์ง€ ํŠน์ง•์„ ๋ชจ๋‘ ๊ฐ€์ง„ Simultor๋Š” Embodied AI Task ์ˆ˜ํ–‰์— ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

728x90
320x100