LLM2D
AILuminate: 介绍来自MLCommons的AI风险与可靠性基准v1.0
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons
作者: Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul R\"ottger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami, Usman Gohar, Ben Huang, Supheakmungkol Sarin, Elie Alhajjar, Canyu Chen, Roman Eng, Kashyap Ramanandula Manjusha, Virendra Mehta, Eileen Long, Murali Emani, Natan Vidra, Benjamin Rukundo, Abolfazl Shahbazi, Kongtao Chen, Rajat Ghosh, Vithursan Thangarasa, Pierre Peign\'e, Abhinav Singh, Max Bartolo, Satyapriya Krishna, Mubashara Akhtar, Rafael Gold, Cody Coleman, Luis Oala, Vassil Tashev, Joseph Marvin Imperial, Amy Russ, Sasidhar Kunapuli, Nicolas Miailhe, Julien Delaunay, Bhaktipriya Radharapu, Rajat Shinde, Tuesday, Debojyoti Dutta, Declan Grabb, Ananya Gangavarapu, Saurav Sahay, Agasthya Gangavarapu, Patrick Schramowski, Stephen Singam, Tom David, Xudong Han, Priyanka Mary Mammen, Tarunima Prabhakar, Venelin Kovatchev, Rebecca Weiss, Ahmed Ahmed, Kelvin N. Manyeki, Sandeep Madireddy, Foutse Khomh, Fedor Zhdanov, Joachim Baumann, Nina Vasan, Xianjun Yang, Carlos Mougn, Jibin Rajan Varghese, Hussain Chinoy, Seshakrishna Jitendar, Manil Maskey, Claire V. Hardgrove, Tianhao Li, Aakash Gupta, Emil Joswin, Yifan Mai, Shachi H Kumar, Cigdem Patlak, Kevin Lu, Vincent Alessi, Sree Bhargavi Balija, Chenhe Gu, Robert Sullivan, James Gealy, Matt Lavrisa, James Goel, Peter Mattson, Percy Liang, Joaquin Vanschoren
发布日期: 4/22/2025
arXiv ID: oai:arXiv.org:2503.05731v2

摘要

arXiv:2503.05731v2 通知类型: 替换-交叉 摘要:AI系统快速的发展和部署迫切需要标准的安全评估框架。本文介绍了AILuminate v1.0,这是首个全面的行业标准基准,用于评估AI产品的风险和可靠性。其开发采用了开放流程,包括来自多个领域的参与者。该基准评估了AI系统在12种危险类别中的抗辩能力,包括暴力犯罪、非暴力犯罪、性犯罪、儿童性剥削、无差别武器、自杀和自伤、知识产权、隐私、诽谤、仇恨言论、色情内容以及专门建议(选举、财务、健康、法律)。我们的方法包括完整的评估标准、广泛的提示数据集、创新的评估框架、评分和报告系统,以及长期支持和演进的技术及组织基础设施。特别地,基准采用了易于理解的五级评分体系(较差到优秀),并结合了基于熵的系统响应评估的创新系统。 此外,本报告还指出了我们方法及其建立安全基准的局限性,包括评估者的不确定性以及单轮交互的限制。本工作代表了建立全球AI风险和可靠性评估标准的关键一步,同时也承认了在多轮交互、多模态理解、其他语言覆盖率以及新兴危险类别等方面的持续开发需求。我们的 findings 为模型开发者、系统集成商和政策制定者提供了宝贵的见解,帮助促进更安全的AI部署。