LLM2D
跨多个模拟世界扩展可指示代理
Scaling Instructable Agents Across Many Simulated Worlds
作者: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi, Zhitao Gong, Lucy Gonzales, Kshitij Gupta, Karol Gregor, Arne Olav Hallingstad, Tim Harley, Sam Haves, Felix Hill, Ed Hirst, Drew A. Hudson, Jony Hudson, Steph Hughes-Fitt, Danilo J. Rezende, Mimi Jasarevic, Laura Kampis, Rosemary Ke, Thomas Keck, Junkyung Kim, Oscar Knagg, Kavya Kopparapu, Rory Lawton, Andrew Lampinen, Shane Legg, Alexander Lerchner, Marjorie Limont, Yulan Liu, Maria Loks-Thompson, Joseph Marino, Kathryn Martin Cussons, Loic Matthey, Siobhan Mcloughlin, Piermaria Mendolicchio, Hamza Merzic, Anna Mitenkova, Alexandre Moufarek, Valeria Oliveira, Yanko Oliveira, Hannah Openshaw, Renke Pan, Aneesh Pappu, Alex Platonov, Ollie Purkiss, David Reichert, John Reid, Pierre Harvey Richemond, Tyson Roberts, Giles Ruscoe, Jaume Sanchez Elias, Tasha Sandars, Daniel P. Sawyer, Tim Scholtes, Guy Simmons, Daniel Slater, Hubert Soyer, Heiko Strathmann, Peter Stys, Allison C. Tam, Denis Teplyashin, Tayfun Terzi, Davide Vercelli, Bojan Vujatovic, Marcus Wainwright, Jane X. Wang, Zhengdong Wang, Daan Wierstra, Duncan Williams, Nathaniel Wong, Sarah York, Nick Young
发布日期: 10/14/2024
arXiv ID: oai:arXiv.org:2404.10179v3

摘要

构建能够在任何3D环境中遵循任意语言指令的具身AI系统,是创造通用AI的关键挑战。实现这一目标需要学习将语言与感知和具身行动联系起来,以完成复杂的任务。可扩展、可指示、多世界智能体(SIMA)项目通过训练智能体在各种虚拟3D环境中遵循自由形式的指令来解决这个问题,包括精心策划的研究环境以及开放式商业视频游戏。我们的目标是开发一个可指示的智能体,它能够在任何模拟的3D环境中完成人类可以完成的任何事情。我们的方法侧重于语言驱动的通用性,同时对假设的限制降到最低。我们的智能体使用通用的类人界面实时与环境交互:输入是图像观察和语言指令,输出是键盘和鼠标操作。这种通用方法具有挑战性,但它允许智能体将语言与许多视觉上复杂且语义丰富的环境联系起来,同时还允许我们在新环境中轻松运行智能体。在本文中,我们描述了我们的动机和目标、已经取得的初步进展,以及在几个不同的研究环境和各种商业视频游戏中取得的有希望的初步结果。