LLM2D
PLaMo-100B:一款专为日语能力设计的全新语言模型
PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
作者: Kenshin Abe (Preferred Elements, Inc.), Kaizaburo Chubachi (Preferred Elements, Inc.), Yasuhiro Fujita (Preferred Elements, Inc.), Yuta Hirokawa (Preferred Elements, Inc.), Kentaro Imajo (Preferred Elements, Inc.), Toshiki Kataoka (Preferred Elements, Inc.), Hiroyoshi Komatsu (Preferred Elements, Inc.), Hiroaki Mikami (Preferred Elements, Inc.), Tsuguo Mogami (Preferred Elements, Inc.), Shogo Murai (Preferred Elements, Inc.), Kosuke Nakago (Preferred Elements, Inc.), Daisuke Nishino (Preferred Elements, Inc.), Toru Ogawa (Preferred Elements, Inc.), Daisuke Okanohara (Preferred Elements, Inc.), Yoshihiko Ozaki (Preferred Elements, Inc.), Shotaro Sano (Preferred Elements, Inc.), Shuji Suzuki (Preferred Elements, Inc.), Tianqi Xu (Preferred Elements, Inc.), Toshihiko Yanase (Preferred Elements, Inc.)
发布日期: 10/11/2024
arXiv ID: oai:arXiv.org:2410.07563v1

摘要

我们介绍了 PLaMo-100B,这是一个为日语能力而设计的大型语言模型。该模型从头开始训练,使用了 2 万亿个词元,并采用了 QK 归一化和 Z 损失等架构来确保训练过程中的稳定性。训练后技术,包括监督微调和直接偏好优化,被应用于改进模型的性能。基准评估表明 PLaMo-100B 表现良好,特别是在日语特定任务中,其结果与 GPT-4 等前沿模型具有竞争力。