跳转到内容

大型语言模型列表

维基百科,自由的百科全书

大型语言模型(LLM)是一种机器学习模型,专为语言生成等自然语言处理任务而设计。LLM 是具有许多参数的语言模型,并通过对大量文本进行自监督学习进行训练。

本页列出了值得注意的大型语言模型。

对于训练成本一列,1 petaFLOP-day = 1 petaFLOP/sec × 1 天 = 8.64×1019 FLOP。此外,仅列出最大模型的成本。

名称 发布日期[a] 开发者 参数量 (十亿) [b] 语料库大小 训练成本 (petaFLOP-day) 许可证[c] 注解
Attention Is All You Need 2017年6月 瓦斯瓦尼等人在Google發表 0.213 3600萬個英語-法語句子對 0.09[1] 未发布 在8個NVIDIA P100 GPU上訓練了30万步。訓練和評估代碼根據Apache 2.0許可證發布。[2]
GPT-1 2018年6月 OpenAI 0.117 1[3] MIT[4] 首个GPT模型,为仅解码器transformer。 在8个P600GPU上训练了30天。
BERT 2018年10月 Google 0.340[5] 33亿单词[5] 9[6] Apache 2.0[7] 这是一个早期且有影响力的语言模型。[8]是仅编码器模型,因此并非为提示或生成而构建。[9] 在 64个TPUv2芯片上训练耗时4天。[10]
T5英语T5 (language model) 2019年10月 Google 11[11] 340亿 tokens[11] Apache 2.0[12] 许多Google项目的基础模型,例如Imagen。[13]
XLNet英语XLNet 2019年6月 Google 0.340[14] 330亿单词 330 Apache 2.0[15] 作为BERT的替代,设计为仅编码器 。在512个TPU v3芯片上训练了5.5天。[16]
GPT-2 2019年2月 OpenAI 1.5[17] 40 GB[18] (~100亿 tokens)[19] 28[20] MIT[21] 在32个TPU v3芯片上训练了一周。[20]
GPT-3 2020年5月 OpenAI 175[22] 3000亿 tokens[19] 3640[23] 专有 2022年,GPT-3的一个经过微调的变体,称为 GPT-3.5,通过名为ChatGPT的网络界面向公众开放。[24]
GPT-Neo 2021年3月 EleutherAI英语EleutherAI 2.7[25] 825 GiB[26] MIT[27] The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[27]
GPT-J英语GPT-J 2021年6月 EleutherAI英语EleutherAI 6[28] 825 GiB[26] 200[29] Apache 2.0 GPT-3-style language model
Megatron-Turing NLG 2021年10月 [30] Microsoft and Nvidia 530[31] 338.6 billion tokens[31] 38000[32] Restricted web access Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[32]
Ernie 3.0 Titan 2021年12月 Baidu 260[33] 4 Tb 专有 Chinese-language LLM. Ernie Bot is based on this model.
Claude[34] 2021年12月 Anthropic 52[35] 400 billion tokens[35] beta Fine-tuned for desirable behavior in conversations.[36]
GLaM (Generalist Language Model) 2021年12月 Google 1200[37] 1.6 trillion tokens[37] 5600[37] 专有 Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher 2021年12月 DeepMind 280[38] 300 billion tokens[39] 5833[40] 专有 Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications) 2022年1月 Google 137[41] 1.56T words,[41] 168 billion tokens[39] 4110[42] 专有 Specialized for response generation in conversations.
GPT-NeoX 2022年2月 EleutherAI英语EleutherAI 20[43] 825 GiB[26] 740[29] Apache 2.0 based on the Megatron architecture
Chinchilla 2022年3月 DeepMind 70[44] 1.4 trillion tokens[44][39] 6805[40] 专有 Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM(路徑語言模型) 2022年4月 Google 540[45] 768 billion tokens[44] 29,250[40] 专有 Trained for ~60 days on ~6000 TPU v4 chips.[40] 截至2024年10月 (2024-10), it is the largest dense Transformer published.
OPT (Open Pretrained Transformer) 2022年5月 Meta 175[46] 180 billion tokens[47] 310[29] Non-commercial research[d] GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.[48]
YaLM 100B 2022年6月 Yandex 100[49] 1.7TB[49] Apache 2.0 English-Russian model based on Microsoft's Megatron-LM.
Minerva 2022年6月 Google 540[50] 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[50] 专有 For solving "mathematical and scientific questions using step-by-step reasoning".[51] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM 2022年7月 Large collaboration led by Hugging Face 175[52] 350 billion tokens (1.6TB)[53] Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica 2022年11月 Meta 120 106 billion tokens[54] 未知 CC-BY-NC-4.0 Trained on scientific text and modalities.
AlexaTM (Teacher Models) 2022年11月 Amazon 20[55] 1.3 trillion[56] 专有[57] bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI) 2023年2月 Meta AI 65[58] 1.4 trillion[58] 6300[59] Non-commercial research[e] Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.[58]
GPT-4 2023年3月 OpenAI 未知[f] (According to rumors: 1760)[61] 未知 未知 专有 Available for ChatGPT Plus users and used in several products.
Chameleon 2024年6月 Meta AI 34[62] 4.4 trillion
Cerebras-GPT 2023年3月 Cerebras英语Cerebras 13[63] 270[29] Apache 2.0 Trained with Chinchilla formula.
Falcon 2023年3月 Technology Innovation Institute英语Technology Innovation Institute 40[64] 1 trillion tokens, from RefinedWeb (filtered web text corpus)[65] plus some "curated corpora".[66] 2800[59] Apache 2.0[67]
BloombergGPT 2023年3月 Bloomberg L.P. 50 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[68] 专有 Trained on financial data from proprietary sources, for financial tasks.
PanGu-Σ 2023年3月 Huawei 1085 329 billion tokens[69] 专有
OpenAssistant[70] 2023年3月 LAION英语LAION 17 1.5 trillion tokens Apache 2.0 Trained on crowdsourced open data
Jurassic-2[71] 2023年3月 AI21 Labs 未知 未知 专有 Multilingual[72]
PaLM 2(路徑語言模型2) 2023年5月 Google 340[73] 3.6 trillion tokens[73] 85,000[59] 专有 Was used in Bard chatbot.[74]
Llama 2 2023年7月 Meta AI 70[75] 2 trillion tokens[75] 21,000 Llama 2 license 1.7 million A100-hours.[76]
Claude 2 2023年7月 Anthropic 未知 未知 未知 专有 Used in Claude chatbot.[77]
Granite 13b 2023年7月 IBM 未知 未知 未知 专有 Used in IBM Watsonx.[78]
Mistral 7B 2023年9月 Mistral AI 7.3[79] 未知 Apache 2.0
Claude 2.1 2023年11月 Anthropic 未知 未知 未知 专有 Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[80]
Grok-1[81] 2023年11月 xAI 314 未知 未知 Apache 2.0 Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter).[82]
Gemini 1.0 2023年12月 Google DeepMind 未知 未知 未知 专有 Multimodal model, comes in three sizes. Used in the chatbot of the same name.[83]
Mixtral 8x7B 2023年12月 Mistral AI 46.7 未知 未知 Apache 2.0 Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[84] Mixture of experts model, with 12.9 billion parameters activated per token.[85]
Mixtral 8x22B 2024年4月 Mistral AI 141 未知 未知 Apache 2.0 [86]
DeepSeek LLM 2023年11月29日 DeepSeek 67 2T tokens[87] 12,000 DeepSeek License Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B[87]
Phi-2 2023年12月 Microsoft 2.7 1.4T tokens 419[88] MIT Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.[88]
Gemini 1.5 2024年2月 Google DeepMind 未知 未知 未知 专有 Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.[89]
Gemini Ultra 2024年2月 Google DeepMind 未知 未知 未知
Gemma 2024年2月 Google DeepMind 7 6T tokens 未知 Gemma Terms of Use[90]
Claude 3 2024年3月 Anthropic 未知 未知 未知 专有 Includes three models, Haiku, Sonnet, and Opus.[91]
Nova页面存档备份,存于互联网档案馆 2024年10月 Rubik's AI页面存档备份,存于互联网档案馆 未知 未知 未知 专有 Includes three models, Nova-Instant, Nova-Air, and Nova-Pro.
DBRX 2024年3月 Databricks英语Databricks與Mosaic ML 136 12T Tokens Databricks Open Model License Training cost 10 million USD.
Fugaku-LLM 2024年5月 富士通東京工業大學 13 380B Tokens The largest model ever trained on CPU-only, on the Fugaku.[92]
Phi-3 2024年4月 Microsoft 14[93] 4.8T Tokens MIT Microsoft markets them as "small language model".[94]
Granite Code Models 2024年5月 IBM 未知 未知 未知 Apache 2.0
Qwen2 2024年6月 阿里雲 72[95] 3T Tokens 未知 Qwen License Multiple sizes, the smallest being 0.5B.
DeepSeek V2 2024年6月 DeepSeek 236 8.1T tokens 28,000 DeepSeek License 1.4M hours on H800.[96]
Nemotron-4 2024年6月 Nvidia 340 9T Tokens 200,000 NVIDIA Open Model License Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[97][98]
Llama 3.1 2024年7月 Meta AI 405 15.6T tokens 440,000 Llama 3 license 405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[99][100]
DeepSeek V3 2024年12月 DeepSeek 671 14.8T tokens 56,000 DeepSeek License H800 GPU上训练278.8万小时。[101]
Amazon Nova 2024年12月 Amazon 未知 未知 未知 专有 Includes three models, Nova Micro, Nova Lite, and Nova Pro[102]
DeepSeek R1 2025年1月 DeepSeek 671 未知 未知 MIT 无预训练,基于V3-Base强化学习[103][104]
Qwen2.5 2025年1月 Alibaba 72 18T tokens 未知 Qwen License [105]
MiniMax-Text-01 January 2025 Minimax 456 4.7T tokens[106] 未知 Minimax Model license [107][106]
Gemini 2.0 2025年2月 Google DeepMind 未知 未知 未知 专有 Three models released: Flash, Flash-Lite and Pro[108][109][110]
Mistral Large 2024年11月 Mistral AI 123 未知 未知 Mistral Research License Upgraded over time. The latest version is 24.11.[111]
Pixtral 2024年11月 Mistral AI 123 未知 未知 Mistral Research License Multimodal. There is also a 12B version which is under Apache 2 license.[111]
Grok 3 2025年2月 xAI 未知 未知 未知,
estimated 5,800,000.
专有 Training cost claimed "10x the compute of previous state-of-the-art models".[112]
Llama 4 2025年4月5日 Meta AI 400 40T tokens Llama 4 license [113][114]
Qwen3 2025年4月 阿里雲 235 36T tokens 未知 Apache 2.0 Multiple sizes, the smallest being 0.6B.[115]
GPT-OSS 2025年8月5日 OpenAI 117 未知 未知 Apache 2.0 有20B和120B兩種模型大小發布。[116]
Claude 4.1 2025年8月5日 Anthropic 未知 未知 未知 专有 Includes one model, Opus.[117]
GPT-5 2025年8月7日 OpenAI 未知 未知 未知 专有 包括三个模型GPT-5,GPT-5 mini,和GPT-5 nano。GPT-5可在ChatGPT及其API中使用,包含思考能力。[118][119]
DeepSeek-V3.1 August 21, 2025 DeepSeek 671 15.639T MIT 训练大小:14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)[120]这是一个可在思考和非思考模式间切换的混合模型。[121]
Apertus 2025年9月2日 ETH Zurich and EPF Lausanne 70 15 trillion[122] 未知 Apache 2.0 据称这是首个符合欧盟人工智能法案》的LLM。[123]
Claude 4.5 2025年9月29日 Anthropic 未知 未知 未知 专有 [124]
DeepSeek-V3.2-Exp 2025年9月29日 DeepSeek 685 MIT 该实验性模型基于v3.1-Terminus构建,使用名为 DeepSeek Sparse Attention (DSA) 的自定义高效机制。[125][126][127]
GLM-4.6 2025年9月30日 智谱 357 Apache 2.0 [128][129][130]
Kimi K2 Thinking 2025年11月6日 Moonshot AI 1000 MIT [131][132][133]
GPT-5.1 2025年11月12日 OpenAI 专有 [134]
Grok 4.1 2025年11月17日 xAI 专有 [135]
Gemini 3 2025年11月18日 Google DeepMind 专有 [136]
Claude Opus 4.5 2025年11月25日 Anthropic 专有 [137]
DeepSeek-V3.2 2025年12月1日 DeepSeek 685 MIT 平衡推理能力与输出长度,适合日常使用场景如问答和通用Agent任务[138][139][140][141]
DeepSeek-V3.2-Speciale 2025年12月1日 DeepSeek 685 MIT 将开源模型的推理能力推向极致,探索模型能力边界;但是仅供研究使用,不支持工具调用[142][143][144][145]


参见

[编辑]

注释

[编辑]
  1. ^ 这是描述模型架构的文档首次发布的日期。
  2. ^ 在许多情况下,研究人员会发布或报告具有不同尺寸的多个模型版本。在这些情况下,此处会列出最大模型的尺寸。
  3. ^ 这是预训练模型权重的许可证。在几乎所有情况下,训练代码本身都是开源的或可以轻松复制。
  4. ^ The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. ^ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. ^ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[60]

参考资料

[编辑]
  1. ^ AI and compute. openai.com. 2022-06-09 [2025-04-24] (美国英语). 
  2. ^ Apache License. TensorFlow. [2025-08-06] –通过GitHub (英语). 
  3. ^ Improving language understanding with unsupervised learning. openai.com. June 11, 2018 [2023-03-18]. (原始内容存档于2023-03-18). 
  4. ^ finetune-transformer-lm. GitHub. [2 January 2024]. (原始内容存档于19 May 2023). 
  5. ^ 5.0 5.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2可免费查阅 [cs.CL]. 
  6. ^ Prickett, Nicole Hemsoth. Cerebras Shifts Architecture To Meet Massive AI/ML Models. The Next Platform. 2021-08-24 [2023-06-20]. (原始内容存档于2023-06-20). 
  7. ^ BERT. March 13, 2023 [March 13, 2023]. (原始内容存档于January 13, 2021) –通过GitHub. 
  8. ^ Manning, Christopher D. Human Language Understanding & Reasoning. Daedalus. 2022, 151 (2): 127–138 [2023-03-09]. S2CID 248377870. doi:10.1162/daed_a_01905可免费查阅. (原始内容存档于2023-11-17). 
  9. ^ Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris. Bidirectional Language Models Are Also Few-shot Learners. 2022. arXiv:2209.14500可免费查阅 [cs.LG]. 
  10. ^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2可免费查阅 [cs.CL]. 
  11. ^ 11.0 11.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. 2020, 21 (140): 1–67 [2025-02-11]. ISSN 1533-7928. arXiv:1910.10683可免费查阅. (原始内容存档于2024-10-05). 
  12. ^ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02 [2024-04-04], (原始内容存档于2024-03-29) 
  13. ^ Imagen: Text-to-Image Diffusion Models. imagen.research.google. [2024-04-04]. (原始内容存档于2024-03-27). 
  14. ^ Pretrained models — transformers 2.0.0 documentation. huggingface.co. [2024-08-05]. (原始内容存档于2024-08-05). 
  15. ^ xlnet. GitHub. [2 January 2024]. (原始内容存档于2 January 2024). 
  16. ^ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2 January 2020. arXiv:1906.08237可免费查阅 [cs.CL]. 
  17. ^ GPT-2: 1.5B Release. OpenAI. 2019-11-05 [2019-11-14]. (原始内容存档于2019-11-14) (英语). 
  18. ^ Better language models and their implications. openai.com. [2023-03-13]. (原始内容存档于2023-03-16). 
  19. ^ 19.0 19.1 OpenAI's GPT-3 Language Model: A Technical Overview. lambdalabs.com. 3 June 2020 [13 March 2023]. (原始内容存档于27 March 2023). 
  20. ^ 20.0 20.1 openai-community/gpt2-xl · Hugging Face. huggingface.co. [2024-07-24]. (原始内容存档于2024-07-24). 
  21. ^ gpt-2. GitHub. [13 March 2023]. (原始内容存档于11 March 2023). 
  22. ^ Wiggers, Kyle. The emerging types of language models and why they matter. TechCrunch. 28 April 2022 [9 March 2023]. (原始内容存档于16 March 2023). 
  23. ^ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario. Language Models are Few-Shot Learners. May 28, 2020. arXiv:2005.14165v4可免费查阅 [cs.CL]. 
  24. ^ ChatGPT: Optimizing Language Models for Dialogue. OpenAI. 2022-11-30 [2023-01-13]. (原始内容存档于2022-11-30). 
  25. ^ GPT Neo. March 15, 2023 [March 12, 2023]. (原始内容存档于March 12, 2023) –通过GitHub. 
  26. ^ 26.0 26.1 26.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. 31 December 2020. arXiv:2101.00027可免费查阅 [cs.CL]. 
  27. ^ 27.0 27.1 Iyer, Abhishek. GPT-3's free alternative GPT-Neo is something to be excited about. VentureBeat. 15 May 2021 [13 March 2023]. (原始内容存档于9 March 2023). 
  28. ^ GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront. www.forefront.ai. [2023-02-28]. (原始内容存档于2023-03-09). 
  29. ^ 29.0 29.1 29.2 29.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. 2023-04-01. arXiv:2304.03208可免费查阅 [cs.LG]. 
  30. ^ Alvi, Ali; Kharya, Paresh. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. Microsoft Research. 11 October 2021 [13 March 2023]. (原始内容存档于13 March 2023). 
  31. ^ 31.0 31.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. 2022-02-04. arXiv:2201.11990可免费查阅 [cs.CL]. 
  32. ^ 32.0 32.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong, DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, 2022-07-21, arXiv:2201.05596可免费查阅 
  33. ^ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. December 23, 2021. arXiv:2112.12731可免费查阅 [cs.CL]. 
  34. ^ Product. Anthropic. [14 March 2023]. (原始内容存档于16 March 2023). 
  35. ^ 35.0 35.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. A General Language Assistant as a Laboratory for Alignment. 9 December 2021. arXiv:2112.00861可免费查阅 [cs.CL]. 
  36. ^ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. Constitutional AI: Harmlessness from AI Feedback. 15 December 2022. arXiv:2212.08073可免费查阅 [cs.CL]. 
  37. ^ 37.0 37.1 37.2 Dai, Andrew M; Du, Nan. More Efficient In-Context Learning with GLaM. ai.googleblog.com. December 9, 2021 [2023-03-09]. (原始内容存档于2023-03-12). 
  38. ^ Language modelling at scale: Gopher, ethical considerations, and retrieval. www.deepmind.com. 8 December 2021 [20 March 2023]. (原始内容存档于20 March 2023). 
  39. ^ 39.0 39.1 39.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. Training Compute-Optimal Large Language Models. 29 March 2022. arXiv:2203.15556可免费查阅 [cs.CL]. 
  40. ^ 40.0 40.1 40.2 40.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways 互联网档案馆存檔,存档日期2023-06-10.
  41. ^ 41.0 41.1 Cheng, Heng-Tze; Thoppilan, Romal. LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything. ai.googleblog.com. January 21, 2022 [2023-03-09]. (原始内容存档于2022-03-25). 
  42. ^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo. LaMDA: Language Models for Dialog Applications. 2022-01-01. arXiv:2201.08239可免费查阅 [cs.CL]. 
  43. ^ Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models: 95–136. 2022-05-01 [2022-12-19]. (原始内容存档于2022-12-10). 
  44. ^ 44.0 44.1 44.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent. An empirical analysis of compute-optimal large language model training. Deepmind Blog. 12 April 2022 [9 March 2023]. (原始内容存档于13 April 2022). 
  45. ^ Narang, Sharan; Chowdhery, Aakanksha. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance. ai.googleblog.com. April 4, 2022 [2023-03-09]. (原始内容存档于2022-04-04) (英语). 
  46. ^ Susan Zhang; Mona Diab; Luke Zettlemoyer. Democratizing access to large-scale language models with OPT-175B. ai.facebook.com. [2023-03-12]. (原始内容存档于2023-03-12). 
  47. ^ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke. OPT: Open Pre-trained Transformer Language Models. 21 June 2022. arXiv:2205.01068可免费查阅 [cs.CL]. 
  48. ^ metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq. GitHub. [2024-10-18]. (原始内容存档于2024-01-24) (英语). 
  49. ^ 49.0 49.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay, YaLM 100B, 2022-06-22 [2023-03-18], (原始内容存档于2023-06-16) 
  50. ^ 50.0 50.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant. Solving Quantitative Reasoning Problems with Language Models. 30 June 2022. arXiv:2206.14858可免费查阅 [cs.CL]. 
  51. ^ Minerva: Solving Quantitative Reasoning Problems with Language Models. ai.googleblog.com. 30 June 2022 [20 March 2023]. (原始内容存档于2022-06-30). 
  52. ^ Ananthaswamy, Anil. In AI, is bigger always better?. Nature. 8 March 2023, 615 (7951): 202–205 [9 March 2023]. Bibcode:2023Natur.615..202A. PMID 36890378. S2CID 257380916. doi:10.1038/d41586-023-00641-w. (原始内容存档于16 March 2023). 
  53. ^ bigscience/bloom · Hugging Face. huggingface.co. [2023-03-13]. (原始内容存档于2023-04-12). 
  54. ^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert. Galactica: A Large Language Model for Science. 16 November 2022. arXiv:2211.09085可免费查阅 [cs.CL]. 
  55. ^ 20B-parameter Alexa model sets new marks in few-shot learning. Amazon Science. 2 August 2022 [12 March 2023]. (原始内容存档于15 March 2023). 
  56. ^ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. 3 August 2022. arXiv:2208.01448可免费查阅 [cs.CL]. 
  57. ^ AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog. aws.amazon.com. 17 November 2022 [13 March 2023]. (原始内容存档于13 March 2023). 
  58. ^ 58.0 58.1 58.2 Introducing LLaMA: A foundational, 65-billion-parameter large language model. Meta AI. 24 February 2023 [9 March 2023]. (原始内容存档于3 March 2023). 
  59. ^ 59.0 59.1 59.2 The Falcon has landed in the Hugging Face ecosystem. huggingface.co. [2023-06-20]. (原始内容存档于2023-06-20). 
  60. ^ GPT-4 Technical Report (PDF). OpenAI. 2023 [March 14, 2023]. (原始内容存档 (PDF)于March 14, 2023). 
  61. ^ Schreiner, Maximilian. GPT-4 architecture, datasets, costs and more leaked. THE DECODER. 2023-07-11 [2024-07-26]. (原始内容存档于2023-07-12) (美国英语). 
  62. ^ Dickson, Ben. Meta introduces Chameleon, a state-of-the-art multimodal model. VentureBeat. 22 May 2024 [2025-02-11]. (原始内容存档于2025-02-11). 
  63. ^ Dey, Nolan. Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models. Cerebras. March 28, 2023 [March 28, 2023]. (原始内容存档于March 28, 2023). 
  64. ^ Abu Dhabi-based TII launches its own version of ChatGPT. tii.ae. [2023-04-03]. (原始内容存档于2023-04-03). 
  65. ^ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. 2023-06-01. arXiv:2306.01116可免费查阅 [cs.CL]. 
  66. ^ tiiuae/falcon-40b · Hugging Face. huggingface.co. 2023-06-09 [2023-06-20]. (原始内容存档于2023-06-02). 
  67. ^ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free 互联网档案馆存檔,存档日期2024-02-08., 31 May 2023
  68. ^ Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon. BloombergGPT: A Large Language Model for Finance. March 30, 2023. arXiv:2303.17564可免费查阅 [cs.LG]. 
  69. ^ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. March 19, 2023. arXiv:2303.10845可免费查阅 [cs.CL]. 
  70. ^ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew. OpenAssistant Conversations – Democratizing Large Language Model Alignment. 2023-04-14. arXiv:2304.07327可免费查阅 [cs.CL]. 
  71. ^ Wrobel, Sharon. Tel Aviv startup rolls out new advanced AI language model to rival OpenAI. www.timesofisrael.com. [2023-07-24]. (原始内容存档于2023-07-24). 
  72. ^ Wiggers, Kyle. With Bedrock, Amazon enters the generative AI race. TechCrunch. 2023-04-13 [2023-07-24]. (原始内容存档于2023-07-24). 
  73. ^ 73.0 73.1 Elias, Jennifer. Google's newest A.I. model uses nearly five times more text data for training than its predecessor. CNBC. 16 May 2023 [18 May 2023]. (原始内容存档于16 May 2023). 
  74. ^ Introducing PaLM 2. Google. May 10, 2023 [May 18, 2023]. (原始内容存档于May 18, 2023). 
  75. ^ 75.0 75.1 Introducing Llama 2: The Next Generation of Our Open Source Large Language Model. Meta AI. 2023 [2023-07-19]. (原始内容存档于2024-01-05). 
  76. ^ llama/MODEL_CARD.md at main · meta-llama/llama. GitHub. [2024-05-28]. (原始内容存档于2024-05-28). 
  77. ^ Claude 2. anthropic.com. [12 December 2023]. (原始内容存档于15 December 2023). 
  78. ^ Nirmal, Dinesh. Building AI for business: IBM's Granite foundation models. IBM Blog. 2023-09-07 [2024-08-11]. (原始内容存档于2024-07-22) (美国英语). 
  79. ^ Announcing Mistral 7B. Mistral. 2023 [2023-10-06]. (原始内容存档于2024-01-06). 
  80. ^ Introducing Claude 2.1. anthropic.com. [12 December 2023]. (原始内容存档于15 December 2023). 
  81. ^ xai-org/grok-1, xai-org, 2024-03-19 [2024-03-19], (原始内容存档于2024-05-28) 
  82. ^ Grok-1 model card. x.ai. [12 December 2023]. (原始内容存档于2023-11-05). 
  83. ^ Gemini – Google DeepMind. deepmind.google. [12 December 2023]. (原始内容存档于8 December 2023). 
  84. ^ Franzen, Carl. Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance. VentureBeat. 11 December 2023 [12 December 2023]. (原始内容存档于11 December 2023). 
  85. ^ Mixtral of experts. mistral.ai. 11 December 2023 [12 December 2023]. (原始内容存档于13 February 2024). 
  86. ^ AI, Mistral. Cheaper, Better, Faster, Stronger. mistral.ai. 2024-04-17 [2024-05-05]. (原始内容存档于2024-05-05). 
  87. ^ 87.0 87.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, 2024-01-05 [2025-02-11], arXiv:2401.02954可免费查阅, (原始内容存档于2025-03-29) 
  88. ^ 88.0 88.1 Hughes, Alyssa. Phi-2: The surprising power of small language models. Microsoft Research. 12 December 2023 [13 December 2023]. (原始内容存档于12 December 2023). 
  89. ^ Our next-generation model: Gemini 1.5. Google. 15 February 2024 [16 February 2024]. (原始内容存档于16 February 2024). This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens. 
  90. ^ Gemma. [2025-02-11]. (原始内容存档于2024-02-21) –通过GitHub. 
  91. ^ Introducing the next generation of Claude. www.anthropic.com. [2024-03-04]. (原始内容存档于2024-03-04). 
  92. ^ Fugaku-LLM/Fugaku-LLM-13B · Hugging Face. huggingface.co. [2024-05-17]. (原始内容存档于2024-05-17). 
  93. ^ Phi-3. azure.microsoft.com. 23 April 2024 [2024-04-28]. (原始内容存档于2024-04-27). 
  94. ^ Phi-3 Model Documentation. huggingface.co. [2024-04-28]. (原始内容存档于2024-05-13). 
  95. ^ Qwen2. GitHub. [2024-06-17]. (原始内容存档于2024-06-17). 
  96. ^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, 2024-06-19 [2025-02-11], arXiv:2405.04434可免费查阅, (原始内容存档于2025-03-30) 
  97. ^ nvidia/Nemotron-4-340B-Base · Hugging Face. huggingface.co. 2024-06-14 [2024-06-15]. (原始内容存档于2024-06-15). 
  98. ^ Nemotron-4 340B | Research. research.nvidia.com. [2024-06-15]. (原始内容存档于2024-06-15). 
  99. ^ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta. [2025-02-11]. (原始内容存档于2024-07-24). 
  100. ^ llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models. GitHub. [2024-07-23]. (原始内容存档于2024-07-23) (英语). 
  101. ^ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26 [2024-12-26], (原始内容存档于2025-03-27) 
  102. ^ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27 [2024-12-27], (原始内容存档于2025-02-11) 
  103. ^ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21 [2025-01-21], (原始内容存档于2025-02-04) 
  104. ^ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, 2025-01-22 [2025-02-11], arXiv:2501.12948可免费查阅, (原始内容存档于2025-04-09) 
  105. ^ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng, Qwen2.5 Technical Report, 2025-01-03 [2025-02-11], arXiv:2412.15115可免费查阅, (原始内容存档于2025-04-01) 
  106. ^ 106.0 106.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao, MiniMax-01: Scaling Foundation Models with Lightning Attention, 2025-01-14 [2025-01-26], arXiv:2501.08313可免费查阅, (原始内容存档于2025-03-22) 
  107. ^ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26 [2025-01-26] 
  108. ^ Kavukcuoglu, Koray. Gemini 2.0 is now available to everyone. Google. [6 February 2025]. (原始内容存档于2025-04-10). 
  109. ^ Gemini 2.0: Flash, Flash-Lite and Pro. Google for Developers. [6 February 2025]. (原始内容存档于2025-04-10). 
  110. ^ Franzen, Carl. Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search. VentureBeat. 5 February 2025 [6 February 2025]. (原始内容存档于2025-03-17). 
  111. ^ 111.0 111.1 Models Overview. mistral.ai. [2025-03-03]. 
  112. ^ Grok 3 Beta — The Age of Reasoning Agents. x.ai. [2025-02-22] (英语). 
  113. ^ meta-llama/Llama-4-Maverick-17B-128E · Hugging Face. huggingface.co. 2025-04-05 [2025-04-06]. 
  114. ^ The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation. ai.meta.com. [2025-04-05]. (原始内容存档于2025-04-05) (英语). 
  115. ^ Team, Qwen. Qwen3: Think Deeper, Act Faster. Qwen. 2025-04-29 [2025-04-29] (英语). 
  116. ^ Whitwam, Ryan. OpenAI announces two "gpt-oss" open AI models, and you can download them today. Ars Technica. 2025-08-05 [2025-08-06] (英语). 
  117. ^ Claude Opus 4.1. www.anthropic.com. [8 August 2025] (英语). 
  118. ^ Introducing GPT-5. openai.com. 7 August 2025 [8 August 2025]. 
  119. ^ OpenAI Platform: GPT-5 Model Documentation. openai.com. [18 August 2025]. 
  120. ^ deepseek-ai/DeepSeek-V3.1 · Hugging Face. huggingface.co. 2025-08-21 [2025-08-25]. 
  121. ^ DeepSeek-V3.1 Release | DeepSeek API Docs. api-docs.deepseek.com. [2025-08-25] (英语). 
  122. ^ Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell. Zürich: ETH Zürich. 2025-09-02 [2025-11-07] (德语). 
  123. ^ Kirchner, Malte. Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor. heise online. 2025-09-02 [2025-11-07] (德语). 
  124. ^ Introducing Claude Sonnet 4.5. www.anthropic.com. [29 September 2025] (英语). 
  125. ^ Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs. api-docs.deepseek.com. [2025-10-01] (英语). 
  126. ^ deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face. huggingface.co. 2025-09-29 [2025-10-01]. 
  127. ^ DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp (PDF). GitHub. [2025-10-01] (英语). 
  128. ^ GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities. z.ai. [2025-10-01] (英语). 
  129. ^ zai-org/GLM-4.6 · Hugging Face. huggingface.co. 2025-09-30 [2025-10-01]. 
  130. ^ GLM-4.6. modelscope.cn. [2025-10-01]. 
  131. ^ Kimi K2 Thinking. moonshotai.github.io. [2025-11-06] (英语). 
  132. ^ moonshotai/Kimi-K2-Thinking · Hugging Face. huggingface.co. 2025-11-06 [2025-11-06]. 
  133. ^ Kimi-K2-Thinking. modelscope.cn. [2025-11-09]. 
  134. ^ GPT-5.1 全新上线:更智能、更具对话感的 ChatGPT. openai.com. [2025-11-12] (中文). 
  135. ^ Grok 4.1. x.ai. [2025-11-17] (英语). 
  136. ^ Gemini 3: Introducing the latest Gemini AI model from Google. blog.google. [2025-11-18] (中文). 
  137. ^ Introducing Claude Opus 4.5. anthropic.com. [2025-11-25] (英语). 
  138. ^ DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] (英语). 
  139. ^ DeepSeek V3.2 正式版:强化 Agent 能力,融入思考推理. mp.weixin.qq.com. [2025-12-01] (中文). 
  140. ^ deepseek-ai/DeepSeek-V3.2 · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01]. 
  141. ^ DeepSeek-V3.2. modelscope.cn. [2025-12-01]. 
  142. ^ DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] (英语). 
  143. ^ DeepSeek V3.2 正式版:强化 Agent 能力,融入思考推理. mp.weixin.qq.com. [2025-12-01] (中文). 
  144. ^ deepseek-ai/DeepSeek-V3.2-Speciale · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01]. 
  145. ^ DeepSeek-V3.2-Speciale. modelscope.cn. [2025-12-01].