LLM2D
Ego-Exo4D:从第一人称和第三人称视角理解人类熟练活动
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
作者: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
发布日期: 9/27/2024
arXiv ID: oai:arXiv.org:2311.18259v4

摘要

我们介绍了 Ego-Exo4D,一个多样化、大规模的多模态多视角视频数据集和基准挑战。Ego-Exo4D 以同时捕获的自我中心和异中心视频为中心,涵盖熟练的人类活动(例如,运动、音乐、舞蹈、自行车修理)。来自全球 13 个城市的 740 名参与者在 123 种不同的自然场景环境中执行这些活动,产生了从 1 到 42 分钟不等的长时段捕获,总共 1,286 小时的视频。数据集的多模态特性前所未有:视频附带多声道音频、眼球注视、3D 点云、相机姿态、IMU 和多个配对的语言描述——包括教练和老师做出的新型“专家评论”,并针对熟练活动领域量身定制。为了推动对熟练人类活动的第一人称视频理解的前沿研究,我们还提供了一套基准任务及其注释,包括细粒度活动理解、熟练度估计、跨视角转换和 3D 手/身体姿态。所有资源均开源,以推动社区中的新研究。项目页面:http://ego-exo4d-data.org/