大型语言模型列表

大型语言模型（LLM）是一种机器学习模型，专为语言生成等自然语言处理任务而设计。LLM 是具有许多参数的语言模型，并通过对大量文本进行自监督学习进行训练。

本页列出了值得注意的大型语言模型。

对于训练成本一列，1 petaFLOP-day = 1 petaFLOP/sec × 1 天 = 8.64×10¹⁹ FLOP。此外，仅列出最大模型的成本。

名称	发布日期^[a]	开发者	参数量 (十亿) ^[b]	语料库大小	训练成本 (petaFLOP-day)	许可证^[c]	注解
Attention Is All You Need	000000002017-06-01-00002017年6月	瓦斯瓦尼等人在Google發表	0.213	3600萬個英語-法語句子對	0.09^[1]	未发布	在8個NVIDIA P100 GPU上訓練了30万步。訓練和評估代碼根據Apache 2.0許可證發布。^[2]
GPT-1	000000002018-06-01-00002018年6月	OpenAI	0.117 !0.117		1^[3]	MIT^[4]	首个GPT模型，为仅解码器transformer。在8个P600GPU上训练了30天。
BERT	000000002018-10-01-00002018年10月	Google	0.340 !0.340^[5]	3300000000 !33亿单词^[5]	9 !9^[6]	Apache 2.0^[7]	这是一个早期且有影响力的语言模型。^[8]是仅编码器模型，因此并非为提示或生成而构建。^[9] 在 64个TPUv2芯片上训练耗时4天。^[10]
T5（英语：T5 (language model)）	000000002019-10-01-00002019年10月	Google	11 !11^[11]	340亿 tokens^[11]		Apache 2.0^[12]	许多Google项目的基础模型，例如Imagen。^[13]
XLNet（英语：XLNet）	000000002019-06-01-00002019年6月	Google	0.340 !0.340^[14]	3300000000 !330亿单词	330	Apache 2.0^[15]	作为BERT的替代，设计为仅编码器。在512个TPU v3芯片上训练了5.5天。^[16]
GPT-2	000000002019-02-01-00002019年2月	OpenAI	1.5 !1.5^[17]	40 GB^[18] (~10000000000 !100亿 tokens)^[19]	28^[20]	MIT^[21]	在32个TPU v3芯片上训练了一周。^[20]
GPT-3	000000002020-05-01-00002020年5月	OpenAI	175 !175^[22]	300000000000 !3000亿 tokens^[19]	3640^[23]	专有	2022年，GPT-3的一个经过微调的变体，称为 GPT-3.5，通过名为ChatGPT的网络界面向公众开放。^[24]
GPT-Neo	000000002021-03-01-00002021年3月	EleutherAI（英语：EleutherAI）	2.7 !2.7^[25]	825 GiB^[26]		MIT^[27]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[27]
GPT-J（英语：GPT-J）	000000002021-06-01-00002021年6月	EleutherAI（英语：EleutherAI）	6 !6^[28]	825 GiB^[26]	200^[29]	Apache 2.0	GPT-3-style language model
Megatron-Turing NLG	000000002021-10-01-00002021年10月 ^[30]	Microsoft and Nvidia	530 !530^[31]	338600000000 !338.6 billion tokens^[31]	38000^[32]	Restricted web access	Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.^[32]
Ernie 3.0 Titan	000000002021-12-01-00002021年12月	Baidu	260 !260^[33]	4 Tb		专有	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[34]	000000002021-12-01-00002021年12月	Anthropic	52 !52^[35]	400000000000 !400 billion tokens^[35]		beta	Fine-tuned for desirable behavior in conversations.^[36]
GLaM (Generalist Language Model)	000000002021-12-01-00002021年12月	Google	1200 !1200^[37]	1600000000000 !1.6 trillion tokens^[37]	5600^[37]	专有	Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher	000000002021-12-01-00002021年12月	DeepMind	280 !280^[38]	300000000000 !300 billion tokens^[39]	5833^[40]	专有	Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications)	000000002022-01-01-00002022年1月	Google	137 !137^[41]	1.56T words,^[41] 168000000000 !168 billion tokens^[39]	4110^[42]	专有	Specialized for response generation in conversations.
GPT-NeoX	000000002022-02-01-00002022年2月	EleutherAI（英语：EleutherAI）	20 !20^[43]	825 GiB^[26]	740^[29]	Apache 2.0	based on the Megatron architecture
Chinchilla	000000002022-03-01-00002022年3月	DeepMind	70 !70^[44]	1400000000000 !1.4 trillion tokens^[44]^[39]	6805^[40]	专有	Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM（路徑語言模型）	000000002022-04-01-00002022年4月	Google	540 !540^[45]	768000000000 !768 billion tokens^[44]	29250 !29,250^[40]	专有	Trained for ~60 days on ~6000 TPU v4 chips.^[40] 截至2024年10月 (2024-10)^[update], it is the largest dense Transformer published.
OPT (Open Pretrained Transformer)	000000002022-05-01-00002022年5月	Meta	175 !175^[46]	180000000000 !180 billion tokens^[47]	310^[29]	Non-commercial research^[d]	GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.^[48]
YaLM 100B	000000002022-06-01-00002022年6月	Yandex	100 !100^[49]	1.7TB^[49]		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM.
Minerva	000000002022-06-01-00002022年6月	Google	540 !540^[50]	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server^[50]		专有	For solving "mathematical and scientific questions using step-by-step reasoning".^[51] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM	000000002022-07-01-00002022年7月	Large collaboration led by Hugging Face	175 !175^[52]	350000000000 !350 billion tokens (1.6TB)^[53]		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica	000000002022-11-01-00002022年11月	Meta	120 !120	350000000000 !106 billion tokens^[54]	未知	CC-BY-NC-4.0	Trained on scientific text and modalities.
AlexaTM (Teacher Models)	000000002022-11-01-00002022年11月	Amazon	20 !20^[55]	1300000000000 !1.3 trillion^[56]		专有^[57]	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	000000002023-02-01-00002023年2月	Meta AI	65 !65^[58]	1400000000000 !1.4 trillion^[58]	6300^[59]	Non-commercial research^[e]	Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.^[58]
GPT-4	000000002023-03-01-00002023年3月	OpenAI	未知^[f] (According to rumors: 1760)^[61]	未知	未知	专有	Available for ChatGPT Plus users and used in several products.
Chameleon	000000002024-06-01-00002024年6月	Meta AI	34 !34^[62]	4400000000000 !4.4 trillion
Cerebras-GPT	000000002023-03-01-00002023年3月	Cerebras（英语：Cerebras）	13 !13^[63]		270^[29]	Apache 2.0	Trained with Chinchilla formula.
Falcon	000000002023-03-01-00002023年3月	Technology Innovation Institute（英语：Technology Innovation Institute）	40 !40^[64]	1 trillion tokens, from RefinedWeb (filtered web text corpus)^[65] plus some "curated corpora".^[66]	2800^[59]	Apache 2.0^[67]
BloombergGPT	000000002023-03-01-00002023年3月	Bloomberg L.P.	50 !50	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets^[68]		专有	Trained on financial data from proprietary sources, for financial tasks.
PanGu-Σ	000000002023-03-01-00002023年3月	Huawei	1085 !1085	329 billion tokens^[69]		专有
OpenAssistant^[70]	000000002023-03-01-00002023年3月	LAION（英语：LAION）	17 !17	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data
Jurassic-2^[71]	000000002023-03-01-00002023年3月	AI21 Labs	未知	未知		专有	Multilingual^[72]
PaLM 2（路徑語言模型2）	000000002023-05-01-00002023年5月	Google	340 !340^[73]	3600000000000 !3.6 trillion tokens^[73]	85000 !85,000^[59]	专有	Was used in Bard chatbot.^[74]
Llama 2	000000002023-07-01-00002023年7月	Meta AI	70 !70^[75]	2000000000000 !2 trillion tokens^[75]	21000 !21,000	Llama 2 license	1.7 million A100-hours.^[76]
Claude 2	000000002023-07-01-00002023年7月	Anthropic	未知	未知	未知	专有	Used in Claude chatbot.^[77]
Granite 13b	000000002023-07-01-00002023年7月	IBM	未知	未知	未知	专有	Used in IBM Watsonx.^[78]
Mistral 7B	000000002023-09-01-00002023年9月	Mistral AI	7.3 !7.3^[79]	未知		Apache 2.0
Claude 2.1	000000002023-11-01-00002023年11月	Anthropic	未知	未知	未知	专有	Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.^[80]
Grok-1^[81]	000000002023-11-01-00002023年11月	xAI	314	未知	未知	Apache 2.0	Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter).^[82]
Gemini 1.0	000000002023-12-01-00002023年12月	Google DeepMind	未知	未知	未知	专有	Multimodal model, comes in three sizes. Used in the chatbot of the same name.^[83]
Mixtral 8x7B	000000002023-12-01-00002023年12月	Mistral AI	46.7	未知	未知	Apache 2.0	Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.^[84] Mixture of experts model, with 12.9 billion parameters activated per token.^[85]
Mixtral 8x22B	000000002024-04-01-00002024年4月	Mistral AI	141	未知	未知	Apache 2.0	^[86]
DeepSeek LLM	000000002023-11-29-00002023年11月29日	DeepSeek	67	2T tokens^[87]	12000 !12,000	DeepSeek License	Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B^[87]
Phi-2	000000002023-12-01-00002023年12月	Microsoft	2.7	1.4T tokens	419^[88]	MIT	Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.^[88]
Gemini 1.5	000000002024-02-01-00002024年2月	Google DeepMind	未知	未知	未知	专有	Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.^[89]
Gemini Ultra	000000002024-02-01-00002024年2月	Google DeepMind	未知	未知	未知
Gemma	000000002024-02-01-00002024年2月	Google DeepMind	7	6T tokens	未知	Gemma Terms of Use^[90]
Claude 3	000000002024-03-01-00002024年3月	Anthropic	未知	未知	未知	专有	Includes three models, Haiku, Sonnet, and Opus.^[91]
Nova （页面存档备份，存于互联网档案馆）	000000002024-10-01-00002024年10月	Rubik's AI （页面存档备份，存于互联网档案馆）	未知	未知	未知	专有	Includes three models, Nova-Instant, Nova-Air, and Nova-Pro.
DBRX	000000002024-03-01-00002024年3月	Databricks（英语：Databricks）與Mosaic ML	136 !136	12T Tokens		Databricks Open Model License	Training cost 10 million USD.
Fugaku-LLM	000000002024-05-01-00002024年5月	富士通與東京工業大學等	13 !13	380B Tokens			The largest model ever trained on CPU-only, on the Fugaku.^[92]
Phi-3	000000002024-04-01-00002024年4月	Microsoft	14^[93]	4.8T Tokens		MIT	Microsoft markets them as "small language model".^[94]
Granite Code Models	000000002024-05-01-00002024年5月	IBM	未知	未知	未知	Apache 2.0
Qwen2	000000002024-06-01-00002024年6月	阿里雲	72^[95]	3T Tokens	未知	Qwen License	Multiple sizes, the smallest being 0.5B.
DeepSeek V2	000000002024-06-01-00002024年6月	DeepSeek	236	8.1T tokens	28000 !28,000	DeepSeek License	1.4M hours on H800.^[96]
Nemotron-4	000000002024-06-01-00002024年6月	Nvidia	340 !340	9T Tokens	200000 !200,000	NVIDIA Open Model License	Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.^[97]^[98]
Llama 3.1	000000002024-07-01-00002024年7月	Meta AI	405	15.6T tokens	440000 !440,000	Llama 3 license	405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.^[99]^[100]
DeepSeek V3	000000002024-12-01-00002024年12月	DeepSeek	671	14.8T tokens	56000 !56,000	DeepSeek License	在H800 GPU上训练278.8万小时。^[101]
Amazon Nova	000000002024-12-01-00002024年12月	Amazon	未知	未知	未知	专有	Includes three models, Nova Micro, Nova Lite, and Nova Pro^[102]
DeepSeek R1	000000002025-01-01-00002025年1月	DeepSeek	671	未知	未知	MIT	无预训练，基于V3-Base强化学习。^[103]^[104]
Qwen2.5	000000002025-01-01-00002025年1月	Alibaba	72	18T tokens	未知	Qwen License	^[105]
MiniMax-Text-01	January 2025	Minimax	456	4.7T tokens^[106]	未知	Minimax Model license	^[107]^[106]
Gemini 2.0	000000002025-02-01-00002025年2月	Google DeepMind	未知	未知	未知	专有	Three models released: Flash, Flash-Lite and Pro^[108]^[109]^[110]
Mistral Large	000000002024-11-01-00002024年11月	Mistral AI	123	未知	未知	Mistral Research License	Upgraded over time. The latest version is 24.11.^[111]
Pixtral	000000002024-11-01-00002024年11月	Mistral AI	123	未知	未知	Mistral Research License	Multimodal. There is also a 12B version which is under Apache 2 license.^[111]
Grok 3	000000002025-02-01-00002025年2月	xAI	未知	未知	未知, estimated 5,800,000.	专有	Training cost claimed "10x the compute of previous state-of-the-art models".^[112]
Llama 4	000000002025-04-05-00002025年4月5日	Meta AI	400 !400	40000000000000 !40T tokens		Llama 4 license	^[113]^[114]
Qwen3	000000002025-04-01-00002025年4月	阿里雲	235	36000000000000 !36T tokens	未知	Apache 2.0	Multiple sizes, the smallest being 0.6B.^[115]
GPT-OSS	000000002025-08-05-00002025年8月5日	OpenAI	117	未知	未知	Apache 2.0	有20B和120B兩種模型大小發布。^[116]
Claude 4.1	000000002025-08-05-00002025年8月5日	Anthropic	未知	未知	未知	专有	Includes one model, Opus.^[117]
GPT-5	000000002025-08-07-00002025年8月7日	OpenAI	未知	未知	未知	专有	包括三个模型GPT-5，GPT-5 mini，和GPT-5 nano。GPT-5可在ChatGPT及其API中使用，包含思考能力。^[118]^[119]
DeepSeek-V3.1	August 21, 2025	DeepSeek	671	15.639T		MIT	训练大小：14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)^[120]这是一个可在思考和非思考模式间切换的混合模型。^[121]
Apertus	000000002025-09-02-00002025年9月2日	ETH Zurich and EPF Lausanne	70	15000000000000 !15 trillion^[122]	未知	Apache 2.0	据称这是首个符合欧盟《人工智能法案》的LLM。^[123]
Claude 4.5	000000002025-09-29-00002025年9月29日	Anthropic	未知	未知	未知	专有	^[124]
DeepSeek-V3.2-Exp	000000002025-09-29-00002025年9月29日	DeepSeek	685			MIT	该实验性模型基于v3.1-Terminus构建，使用名为 DeepSeek Sparse Attention (DSA) 的自定义高效机制。^[125]^[126]^[127]
GLM-4.6	000000002025-09-30-00002025年9月30日	智谱	357			Apache 2.0	^[128]^[129]^[130]
Kimi K2 Thinking	000000002025-11-06-00002025年11月6日	Moonshot AI	1000			MIT	^[131]^[132]^[133]
GPT-5.1	000000002025-11-12-00002025年11月12日	OpenAI				专有	^[134]
Grok 4.1	000000002025-11-17-00002025年11月17日	xAI				专有	^[135]
Gemini 3	000000002025-11-18-00002025年11月18日	Google DeepMind				专有	^[136]
Claude Opus 4.5	000000002025-11-25-00002025年11月25日	Anthropic				专有	^[137]
DeepSeek-V3.2	000000002025-12-01-00002025年12月1日	DeepSeek	685			MIT	平衡推理能力与输出长度，适合日常使用场景如问答和通用Agent任务^[138]^[139]^[140]^[141]
DeepSeek-V3.2-Speciale	000000002025-12-01-00002025年12月1日	DeepSeek	685			MIT	将开源模型的推理能力推向极致，探索模型能力边界；但是仅供研究使用，不支持工具调用^[142]^[143]^[144]^[145]

参见

聊天机器人列表

注释

^ 这是描述模型架构的文档首次发布的日期。
^ 在许多情况下，研究人员会发布或报告具有不同尺寸的多个模型版本。在这些情况下，此处会列出最大模型的尺寸。
^ 这是预训练模型权重的许可证。在几乎所有情况下，训练代码本身都是开源的或可以轻松复制。
^ The smaller models including 66B are publicly available, while the 175B model is available on request.
^ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
^ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[60]

参考资料

^ AI and compute. openai.com. 2022-06-09 [2025-04-24] （美国英语）.
^ Apache License. TensorFlow. [2025-08-06] –通过GitHub （英语）.
^ Improving language understanding with unsupervised learning. openai.com. June 11, 2018 [2023-03-18]. （原始内容存档于2023-03-18）.
^ finetune-transformer-lm. GitHub. [2 January 2024]. （原始内容存档于19 May 2023）.
^ ^5.0 ^5.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ Prickett, Nicole Hemsoth. Cerebras Shifts Architecture To Meet Massive AI/ML Models. The Next Platform. 2021-08-24 [2023-06-20]. （原始内容存档于2023-06-20）.
^ BERT. March 13, 2023 [March 13, 2023]. （原始内容存档于January 13, 2021） –通过GitHub.
^ Manning, Christopher D. Human Language Understanding & Reasoning. Daedalus. 2022, 151 (2): 127–138 [2023-03-09]. S2CID 248377870. doi:10.1162/daed_a_01905 . （原始内容存档于2023-11-17）.
^ Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris. Bidirectional Language Models Are Also Few-shot Learners. 2022. arXiv:2209.14500  [cs.LG].
^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ ^11.0 ^11.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. 2020, 21 (140): 1–67 [2025-02-11]. ISSN 1533-7928. arXiv:1910.10683 . （原始内容存档于2024-10-05）.
^ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02 [2024-04-04], （原始内容存档于2024-03-29）
^ Imagen: Text-to-Image Diffusion Models. imagen.research.google. [2024-04-04]. （原始内容存档于2024-03-27）.
^ Pretrained models — transformers 2.0.0 documentation. huggingface.co. [2024-08-05]. （原始内容存档于2024-08-05）.
^ xlnet. GitHub. [2 January 2024]. （原始内容存档于2 January 2024）.
^ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2 January 2020. arXiv:1906.08237  [cs.CL].
^ GPT-2: 1.5B Release. OpenAI. 2019-11-05 [2019-11-14]. （原始内容存档于2019-11-14）（英语）.
^ Better language models and their implications. openai.com. [2023-03-13]. （原始内容存档于2023-03-16）.
^ ^19.0 ^19.1 OpenAI's GPT-3 Language Model: A Technical Overview. lambdalabs.com. 3 June 2020 [13 March 2023]. （原始内容存档于27 March 2023）.
^ ^20.0 ^20.1 openai-community/gpt2-xl · Hugging Face. huggingface.co. [2024-07-24]. （原始内容存档于2024-07-24）.
^ gpt-2. GitHub. [13 March 2023]. （原始内容存档于11 March 2023）.
^ Wiggers, Kyle. The emerging types of language models and why they matter. TechCrunch. 28 April 2022 [9 March 2023]. （原始内容存档于16 March 2023）.
^ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario. Language Models are Few-Shot Learners. May 28, 2020. arXiv:2005.14165v4  [cs.CL].
^ ChatGPT: Optimizing Language Models for Dialogue. OpenAI. 2022-11-30 [2023-01-13]. （原始内容存档于2022-11-30）.
^ GPT Neo. March 15, 2023 [March 12, 2023]. （原始内容存档于March 12, 2023） –通过GitHub.
^ ^26.0 ^26.1 ^26.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. 31 December 2020. arXiv:2101.00027  [cs.CL].
^ ^27.0 ^27.1 Iyer, Abhishek. GPT-3's free alternative GPT-Neo is something to be excited about. VentureBeat. 15 May 2021 [13 March 2023]. （原始内容存档于9 March 2023）.
^ GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront. www.forefront.ai. [2023-02-28]. （原始内容存档于2023-03-09）.
^ ^29.0 ^29.1 ^29.2 ^29.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. 2023-04-01. arXiv:2304.03208  [cs.LG].
^ Alvi, Ali; Kharya, Paresh. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. Microsoft Research. 11 October 2021 [13 March 2023]. （原始内容存档于13 March 2023）.
^ ^31.0 ^31.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. 2022-02-04. arXiv:2201.11990  [cs.CL].
^ ^32.0 ^32.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong, DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, 2022-07-21, arXiv:2201.05596 
^ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. December 23, 2021. arXiv:2112.12731  [cs.CL].
^ Product. Anthropic. [14 March 2023]. （原始内容存档于16 March 2023）.
^ ^35.0 ^35.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. A General Language Assistant as a Laboratory for Alignment. 9 December 2021. arXiv:2112.00861  [cs.CL].
^ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. Constitutional AI: Harmlessness from AI Feedback. 15 December 2022. arXiv:2212.08073  [cs.CL].
^ ^37.0 ^37.1 ^37.2 Dai, Andrew M; Du, Nan. More Efficient In-Context Learning with GLaM. ai.googleblog.com. December 9, 2021 [2023-03-09]. （原始内容存档于2023-03-12）.
^ Language modelling at scale: Gopher, ethical considerations, and retrieval. www.deepmind.com. 8 December 2021 [20 March 2023]. （原始内容存档于20 March 2023）.
^ ^39.0 ^39.1 ^39.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. Training Compute-Optimal Large Language Models. 29 March 2022. arXiv:2203.15556  [cs.CL].
^ ^40.0 ^40.1 ^40.2 ^40.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways 互联网档案馆的存檔，存档日期2023-06-10.
^ ^41.0 ^41.1 Cheng, Heng-Tze; Thoppilan, Romal. LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything. ai.googleblog.com. January 21, 2022 [2023-03-09]. （原始内容存档于2022-03-25）.
^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo. LaMDA: Language Models for Dialog Applications. 2022-01-01. arXiv:2201.08239  [cs.CL].
^ Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models: 95–136. 2022-05-01 [2022-12-19]. （原始内容存档于2022-12-10）.
^ ^44.0 ^44.1 ^44.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent. An empirical analysis of compute-optimal large language model training. Deepmind Blog. 12 April 2022 [9 March 2023]. （原始内容存档于13 April 2022）.
^ Narang, Sharan; Chowdhery, Aakanksha. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance. ai.googleblog.com. April 4, 2022 [2023-03-09]. （原始内容存档于2022-04-04）（英语）.
^ Susan Zhang; Mona Diab; Luke Zettlemoyer. Democratizing access to large-scale language models with OPT-175B. ai.facebook.com. [2023-03-12]. （原始内容存档于2023-03-12）.
^ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke. OPT: Open Pre-trained Transformer Language Models. 21 June 2022. arXiv:2205.01068  [cs.CL].
^ metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq. GitHub. [2024-10-18]. （原始内容存档于2024-01-24）（英语）.
^ ^49.0 ^49.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay, YaLM 100B, 2022-06-22 [2023-03-18], （原始内容存档于2023-06-16）
^ ^50.0 ^50.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant. Solving Quantitative Reasoning Problems with Language Models. 30 June 2022. arXiv:2206.14858  [cs.CL].
^ Minerva: Solving Quantitative Reasoning Problems with Language Models. ai.googleblog.com. 30 June 2022 [20 March 2023]. （原始内容存档于2022-06-30）.
^ Ananthaswamy, Anil. In AI, is bigger always better?. Nature. 8 March 2023, 615 (7951): 202–205 [9 March 2023]. Bibcode:2023Natur.615..202A. PMID 36890378. S2CID 257380916. doi:10.1038/d41586-023-00641-w. （原始内容存档于16 March 2023）.
^ bigscience/bloom · Hugging Face. huggingface.co. [2023-03-13]. （原始内容存档于2023-04-12）.
^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert. Galactica: A Large Language Model for Science. 16 November 2022. arXiv:2211.09085  [cs.CL].
^ 20B-parameter Alexa model sets new marks in few-shot learning. Amazon Science. 2 August 2022 [12 March 2023]. （原始内容存档于15 March 2023）.
^ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. 3 August 2022. arXiv:2208.01448  [cs.CL].
^ AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog. aws.amazon.com. 17 November 2022 [13 March 2023]. （原始内容存档于13 March 2023）.
^ ^58.0 ^58.1 ^58.2 Introducing LLaMA: A foundational, 65-billion-parameter large language model. Meta AI. 24 February 2023 [9 March 2023]. （原始内容存档于3 March 2023）.
^ ^59.0 ^59.1 ^59.2 The Falcon has landed in the Hugging Face ecosystem. huggingface.co. [2023-06-20]. （原始内容存档于2023-06-20）.
^ GPT-4 Technical Report (PDF). OpenAI. 2023 [March 14, 2023]. （原始内容存档 (PDF)于March 14, 2023）.
^ Schreiner, Maximilian. GPT-4 architecture, datasets, costs and more leaked. THE DECODER. 2023-07-11 [2024-07-26]. （原始内容存档于2023-07-12）（美国英语）.
^ Dickson, Ben. Meta introduces Chameleon, a state-of-the-art multimodal model. VentureBeat. 22 May 2024 [2025-02-11]. （原始内容存档于2025-02-11）.
^ Dey, Nolan. Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models. Cerebras. March 28, 2023 [March 28, 2023]. （原始内容存档于March 28, 2023）.
^ Abu Dhabi-based TII launches its own version of ChatGPT. tii.ae. [2023-04-03]. （原始内容存档于2023-04-03）.
^ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. 2023-06-01. arXiv:2306.01116  [cs.CL].
^ tiiuae/falcon-40b · Hugging Face. huggingface.co. 2023-06-09 [2023-06-20]. （原始内容存档于2023-06-02）.
^ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free 互联网档案馆的存檔，存档日期2024-02-08., 31 May 2023
^ Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon. BloombergGPT: A Large Language Model for Finance. March 30, 2023. arXiv:2303.17564  [cs.LG].
^ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. March 19, 2023. arXiv:2303.10845  [cs.CL].
^ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew. OpenAssistant Conversations – Democratizing Large Language Model Alignment. 2023-04-14. arXiv:2304.07327  [cs.CL].
^ Wrobel, Sharon. Tel Aviv startup rolls out new advanced AI language model to rival OpenAI. www.timesofisrael.com. [2023-07-24]. （原始内容存档于2023-07-24）.
^ Wiggers, Kyle. With Bedrock, Amazon enters the generative AI race. TechCrunch. 2023-04-13 [2023-07-24]. （原始内容存档于2023-07-24）.
^ ^73.0 ^73.1 Elias, Jennifer. Google's newest A.I. model uses nearly five times more text data for training than its predecessor. CNBC. 16 May 2023 [18 May 2023]. （原始内容存档于16 May 2023）.
^ Introducing PaLM 2. Google. May 10, 2023 [May 18, 2023]. （原始内容存档于May 18, 2023）.
^ ^75.0 ^75.1 Introducing Llama 2: The Next Generation of Our Open Source Large Language Model. Meta AI. 2023 [2023-07-19]. （原始内容存档于2024-01-05）.
^ llama/MODEL_CARD.md at main · meta-llama/llama. GitHub. [2024-05-28]. （原始内容存档于2024-05-28）.
^ Claude 2. anthropic.com. [12 December 2023]. （原始内容存档于15 December 2023）.
^ Nirmal, Dinesh. Building AI for business: IBM's Granite foundation models. IBM Blog. 2023-09-07 [2024-08-11]. （原始内容存档于2024-07-22）（美国英语）.
^ Announcing Mistral 7B. Mistral. 2023 [2023-10-06]. （原始内容存档于2024-01-06）.
^ Introducing Claude 2.1. anthropic.com. [12 December 2023]. （原始内容存档于15 December 2023）.
^ xai-org/grok-1, xai-org, 2024-03-19 [2024-03-19], （原始内容存档于2024-05-28）
^ Grok-1 model card. x.ai. [12 December 2023]. （原始内容存档于2023-11-05）.
^ Gemini – Google DeepMind. deepmind.google. [12 December 2023]. （原始内容存档于8 December 2023）.
^ Franzen, Carl. Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance. VentureBeat. 11 December 2023 [12 December 2023]. （原始内容存档于11 December 2023）.
^ Mixtral of experts. mistral.ai. 11 December 2023 [12 December 2023]. （原始内容存档于13 February 2024）.
^ AI, Mistral. Cheaper, Better, Faster, Stronger. mistral.ai. 2024-04-17 [2024-05-05]. （原始内容存档于2024-05-05）.
^ ^87.0 ^87.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, 2024-01-05 [2025-02-11], arXiv:2401.02954 , （原始内容存档于2025-03-29）
^ ^88.0 ^88.1 Hughes, Alyssa. Phi-2: The surprising power of small language models. Microsoft Research. 12 December 2023 [13 December 2023]. （原始内容存档于12 December 2023）.
^ Our next-generation model: Gemini 1.5. Google. 15 February 2024 [16 February 2024]. （原始内容存档于16 February 2024）. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.
^ Gemma. [2025-02-11]. （原始内容存档于2024-02-21） –通过GitHub.
^ Introducing the next generation of Claude. www.anthropic.com. [2024-03-04]. （原始内容存档于2024-03-04）.
^ Fugaku-LLM/Fugaku-LLM-13B · Hugging Face. huggingface.co. [2024-05-17]. （原始内容存档于2024-05-17）.
^ Phi-3. azure.microsoft.com. 23 April 2024 [2024-04-28]. （原始内容存档于2024-04-27）.
^ Phi-3 Model Documentation. huggingface.co. [2024-04-28]. （原始内容存档于2024-05-13）.
^ Qwen2. GitHub. [2024-06-17]. （原始内容存档于2024-06-17）.
^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, 2024-06-19 [2025-02-11], arXiv:2405.04434 , （原始内容存档于2025-03-30）
^ nvidia/Nemotron-4-340B-Base · Hugging Face. huggingface.co. 2024-06-14 [2024-06-15]. （原始内容存档于2024-06-15）.
^ Nemotron-4 340B | Research. research.nvidia.com. [2024-06-15]. （原始内容存档于2024-06-15）.
^ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta. [2025-02-11]. （原始内容存档于2024-07-24）.
^ llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models. GitHub. [2024-07-23]. （原始内容存档于2024-07-23）（英语）.
^ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26 [2024-12-26], （原始内容存档于2025-03-27）
^ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27 [2024-12-27], （原始内容存档于2025-02-11）
^ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21 [2025-01-21], （原始内容存档于2025-02-04）
^ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, 2025-01-22 [2025-02-11], arXiv:2501.12948 , （原始内容存档于2025-04-09）
^ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng, Qwen2.5 Technical Report, 2025-01-03 [2025-02-11], arXiv:2412.15115 , （原始内容存档于2025-04-01）
^ ^106.0 ^106.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao, MiniMax-01: Scaling Foundation Models with Lightning Attention, 2025-01-14 [2025-01-26], arXiv:2501.08313 , （原始内容存档于2025-03-22）
^ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26 [2025-01-26]
^ Kavukcuoglu, Koray. Gemini 2.0 is now available to everyone. Google. [6 February 2025]. （原始内容存档于2025-04-10）.
^ Gemini 2.0: Flash, Flash-Lite and Pro. Google for Developers. [6 February 2025]. （原始内容存档于2025-04-10）.
^ Franzen, Carl. Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search. VentureBeat. 5 February 2025 [6 February 2025]. （原始内容存档于2025-03-17）.
^ ^111.0 ^111.1 Models Overview. mistral.ai. [2025-03-03].
^ Grok 3 Beta — The Age of Reasoning Agents. x.ai. [2025-02-22] （英语）.
^ meta-llama/Llama-4-Maverick-17B-128E · Hugging Face. huggingface.co. 2025-04-05 [2025-04-06].
^ The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation. ai.meta.com. [2025-04-05]. （原始内容存档于2025-04-05）（英语）.
^ Team, Qwen. Qwen3: Think Deeper, Act Faster. Qwen. 2025-04-29 [2025-04-29] （英语）.
^ Whitwam, Ryan. OpenAI announces two "gpt-oss" open AI models, and you can download them today. Ars Technica. 2025-08-05 [2025-08-06] （英语）.
^ Claude Opus 4.1. www.anthropic.com. [8 August 2025] （英语）.
^ Introducing GPT-5. openai.com. 7 August 2025 [8 August 2025].
^ OpenAI Platform: GPT-5 Model Documentation. openai.com. [18 August 2025].
^ deepseek-ai/DeepSeek-V3.1 · Hugging Face. huggingface.co. 2025-08-21 [2025-08-25].
^ DeepSeek-V3.1 Release | DeepSeek API Docs. api-docs.deepseek.com. [2025-08-25] （英语）.
^ Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell. Zürich: ETH Zürich. 2025-09-02 [2025-11-07] （德语）.
^ Kirchner, Malte. Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor. heise online. 2025-09-02 [2025-11-07] （德语）.
^ Introducing Claude Sonnet 4.5. www.anthropic.com. [29 September 2025] （英语）.
^ Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs. api-docs.deepseek.com. [2025-10-01] （英语）.
^ deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face. huggingface.co. 2025-09-29 [2025-10-01].
^ DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp (PDF). GitHub. [2025-10-01] （英语）.
^ GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities. z.ai. [2025-10-01] （英语）.
^ zai-org/GLM-4.6 · Hugging Face. huggingface.co. 2025-09-30 [2025-10-01].
^ GLM-4.6. modelscope.cn. [2025-10-01].
^ Kimi K2 Thinking. moonshotai.github.io. [2025-11-06] （英语）.
^ moonshotai/Kimi-K2-Thinking · Hugging Face. huggingface.co. 2025-11-06 [2025-11-06].
^ Kimi-K2-Thinking. modelscope.cn. [2025-11-09].
^ GPT-5.1 全新上线：更智能、更具对话感的 ChatGPT. openai.com. [2025-11-12] （中文）.
^ Grok 4.1. x.ai. [2025-11-17] （英语）.
^ Gemini 3: Introducing the latest Gemini AI model from Google. blog.google. [2025-11-18] （中文）.
^ Introducing Claude Opus 4.5. anthropic.com. [2025-11-25] （英语）.
^ DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] （英语）.
^ DeepSeek V3.2 正式版：强化 Agent 能力，融入思考推理. mp.weixin.qq.com. [2025-12-01] （中文）.
^ deepseek-ai/DeepSeek-V3.2 · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01].
^ DeepSeek-V3.2. modelscope.cn. [2025-12-01].
^ DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] （英语）.
^ DeepSeek V3.2 正式版：强化 Agent 能力，融入思考推理. mp.weixin.qq.com. [2025-12-01] （中文）.
^ deepseek-ai/DeepSeek-V3.2-Speciale · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01].
^ DeepSeek-V3.2-Speciale. modelscope.cn. [2025-12-01].

[1] 这是描述模型架构的文档首次发布的日期。

[2] 在许多情况下，研究人员会发布或报告具有不同尺寸的多个模型版本。在这些情况下，此处会列出最大模型的尺寸。

[3] 这是预训练模型权重的许可证。在几乎所有情况下，训练代码本身都是开源的或可以轻松复制。

[51] The smaller models including 66B are publicly available, while the 175B model is available on request.

[64] Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.

[66] As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[60]

[4] AI and compute. openai.com. 2022-06-09 [2025-04-24] （美国英语）.

[5] Apache License. TensorFlow. [2025-08-06] –通过GitHub （英语）.

[oai-unsup-6] Improving language understanding with unsupervised learning. openai.com. June 11, 2018 [2023-03-18]. （原始内容存档于2023-03-18）.

[gpt1-7] tune-transformer-lm. GitHub. [2 January 2024]. （原始内容存档于19 May 2023）.

[bert-paper-8] 5.0 ^5.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[bHZJ2-9] Prickett, Nicole Hemsoth. Cerebras Shifts Architecture To Meet Massive AI/ML Models. The Next Platform. 2021-08-24 [2023-06-20]. （原始内容存档于2023-06-20）.

[bert-web-10] BERT. March 13, 2023 [March 13, 2023]. （原始内容存档于January 13, 2021） –通过GitHub.

[Manning-2022-11] Manning, Christopher D. Human Language Understanding & Reasoning. Daedalus. 2022, 151 (2): 127–138 [2023-03-09]. S2CID 248377870. doi:10.1162/daed_a_01905 . （原始内容存档于2023-11-17）.

[Ir545-12] Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris. Bidirectional Language Models Are Also Few-shot Learners. 2022. arXiv:2209.14500  [cs.LG].

[:02-13] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[:6-14] 11.0 ^11.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. 2020, 21 (140): 1–67 [2025-02-11]. ISSN 1533-7928. arXiv:1910.10683 . （原始内容存档于2024-10-05）.

[15] -research/text-to-text-transfer-transformer, Google Research, 2024-04-02 [2024-04-04], （原始内容存档于2024-03-29）

[16] Imagen: Text-to-Image Diffusion Models. imagen.research.google. [2024-04-04]. （原始内容存档于2024-03-27）.

[17] Pretrained models — transformers 2.0.0 documentation. huggingface.co. [2024-08-05]. （原始内容存档于2024-08-05）.

[xlnet-18] xlnet. GitHub. [2 January 2024]. （原始内容存档于2 January 2024）.

[LX3rI-19] Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2 January 2020. arXiv:1906.08237  [cs.CL].

[15Brelease-20] GPT-2: 1.5B Release. OpenAI. 2019-11-05 [2019-11-14]. （原始内容存档于2019-11-14）（英语）.

[5T8u5-21] Better language models and their implications. openai.com. [2023-03-13]. （原始内容存档于2023-03-16）.

[LambdaLabs-22] 19.0 ^19.1 OpenAI's GPT-3 Language Model: A Technical Overview. lambdalabs.com. 3 June 2020 [13 March 2023]. （原始内容存档于27 March 2023）.

[:10-23] 20.0 ^20.1 openai-community/gpt2-xl · Hugging Face. huggingface.co. [2024-07-24]. （原始内容存档于2024-07-24）.

[Sudbe-24] t-2. GitHub. [13 March 2023]. （原始内容存档于11 March 2023）.

[Wiggers-25] Wiggers, Kyle. The emerging types of language models and why they matter. TechCrunch. 28 April 2022 [9 March 2023]. （原始内容存档于16 March 2023）.

[:2-26] Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario. Language Models are Few-Shot Learners. May 28, 2020. arXiv:2005.14165v4  [cs.CL].

[chatgpt-blog-27] ChatGPT: Optimizing Language Models for Dialogue. OpenAI. 2022-11-30 [2023-01-13]. （原始内容存档于2022-11-30）.

[gpt-neo-28] GPT Neo. March 15, 2023 [March 12, 2023]. （原始内容存档于March 12, 2023） –通过GitHub.

[Pile-29] 26.0 ^26.1 ^26.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. 31 December 2020. arXiv:2101.00027  [cs.CL].

[vb-gpt-neo-30] 27.0 ^27.1 Iyer, Abhishek. GPT-3's free alternative GPT-Neo is something to be excited about. VentureBeat. 15 May 2021 [13 March 2023]. （原始内容存档于9 March 2023）.

[JxohJ-31] GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront. www.forefront.ai. [2023-02-28]. （原始内容存档于2023-03-09）.

[:3-32] 29.0 ^29.1 ^29.2 ^29.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. 2023-04-01. arXiv:2304.03208  [cs.LG].

[BwnW5-33] Alvi, Ali; Kharya, Paresh. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. Microsoft Research. 11 October 2021 [13 March 2023]. （原始内容存档于13 March 2023）.

[mtnlg-preprint-34] 31.0 ^31.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. 2022-02-04. arXiv:2201.11990  [cs.CL].

[:11-35] 32.0 ^32.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong, DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, 2022-07-21, arXiv:2201.05596 

[qeOB8-36] Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. December 23, 2021. arXiv:2112.12731  [cs.CL].

[i8jc4-37] Product. Anthropic. [14 March 2023]. （原始内容存档于16 March 2023）.

[AnthroArch-38] 35.0 ^35.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. A General Language Assistant as a Laboratory for Alignment. 9 December 2021. arXiv:2112.00861  [cs.CL].

[RZqhw-39] Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. Constitutional AI: Harmlessness from AI Feedback. 15 December 2022. arXiv:2212.08073  [cs.CL].

[glam-blog-40] 37.0 ^37.1 ^37.2 Dai, Andrew M; Du, Nan. More Efficient In-Context Learning with GLaM. ai.googleblog.com. December 9, 2021 [2023-03-09]. （原始内容存档于2023-03-12）.

[mD5eE-41] Language modelling at scale: Gopher, ethical considerations, and retrieval. www.deepmind.com. 8 December 2021 [20 March 2023]. （原始内容存档于20 March 2023）.

[hoffman-42] 39.0 ^39.1 ^39.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. Training Compute-Optimal Large Language Models. 29 March 2022. arXiv:2203.15556  [cs.CL].

[:4-43] 40.0 ^40.1 ^40.2 ^40.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways 互联网档案馆的存檔，存档日期2023-06-10.

[lamda-blog-44] 41.0 ^41.1 Cheng, Heng-Tze; Thoppilan, Romal. LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything. ai.googleblog.com. January 21, 2022 [2023-03-09]. （原始内容存档于2022-03-25）.

[DMs9Z-45] Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo. LaMDA: Language Models for Dialog Applications. 2022-01-01. arXiv:2201.08239  [cs.CL].

[gpt-neox-20b-46] Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models: 95–136. 2022-05-01 [2022-12-19]. （原始内容存档于2022-12-10）.

[chinchilla-blog-47] 44.0 ^44.1 ^44.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent. An empirical analysis of compute-optimal large language model training. Deepmind Blog. 12 April 2022 [9 March 2023]. （原始内容存档于13 April 2022）.

[palm-blog-48] Narang, Sharan; Chowdhery, Aakanksha. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance. ai.googleblog.com. April 4, 2022 [2023-03-09]. （原始内容存档于2022-04-04）（英语）.

[jlof8-49] Susan Zhang; Mona Diab; Luke Zettlemoyer. Democratizing access to large-scale language models with OPT-175B. ai.facebook.com. [2023-03-12]. （原始内容存档于2023-03-12）.

[QjTIc-50] Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke. OPT: Open Pre-trained Transformer Language Models. 21 June 2022. arXiv:2205.01068  [cs.CL].

[52] taseq/projects/OPT/chronicles at main · facebookresearch/metaseq. GitHub. [2024-10-18]. （原始内容存档于2024-01-24）（英语）.

[yalm-repo-53] 49.0 ^49.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay, YaLM 100B, 2022-06-22 [2023-03-18], （原始内容存档于2023-06-16）

[minerva-paper-54] 50.0 ^50.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant. Solving Quantitative Reasoning Problems with Language Models. 30 June 2022. arXiv:2206.14858  [cs.CL].

[FfCNK-55] Minerva: Solving Quantitative Reasoning Problems with Language Models. ai.googleblog.com. 30 June 2022 [20 March 2023]. （原始内容存档于2022-06-30）.

[bigger-better-56] Ananthaswamy, Anil. In AI, is bigger always better?. Nature. 8 March 2023, 615 (7951): 202–205 [9 March 2023]. Bibcode:2023Natur.615..202A. PMID 36890378. S2CID 257380916. doi:10.1038/d41586-023-00641-w. （原始内容存档于16 March 2023）.

[B8wB2-57] science/bloom · Hugging Face. huggingface.co. [2023-03-13]. （原始内容存档于2023-04-12）.

[37sY6-58] Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert. Galactica: A Large Language Model for Science. 16 November 2022. arXiv:2211.09085  [cs.CL].

[u5szh-59] 20B-parameter Alexa model sets new marks in few-shot learning. Amazon Science. 2 August 2022 [12 March 2023]. （原始内容存档于15 March 2023）.

[HaA7l-60] Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. 3 August 2022. arXiv:2208.01448  [cs.CL].

[rpehM-61] AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog. aws.amazon.com. 17 November 2022 [13 March 2023]. （原始内容存档于13 March 2023）.

[llama-blog-62] 58.0 ^58.1 ^58.2 Introducing LLaMA: A foundational, 65-billion-parameter large language model. Meta AI. 24 February 2023 [9 March 2023]. （原始内容存档于3 March 2023）.

[:5-63] 59.0 ^59.1 ^59.2 The Falcon has landed in the Hugging Face ecosystem. huggingface.co. [2023-06-20]. （原始内容存档于2023-06-20）.

[GPT4Tech-65] GPT-4 Technical Report (PDF). OpenAI. 2023 [March 14, 2023]. （原始内容存档 (PDF)于March 14, 2023）.

[67] Schreiner, Maximilian. GPT-4 architecture, datasets, costs and more leaked. THE DECODER. 2023-07-11 [2024-07-26]. （原始内容存档于2023-07-12）（美国英语）.

[68] Dickson, Ben. Meta introduces Chameleon, a state-of-the-art multimodal model. VentureBeat. 22 May 2024 [2025-02-11]. （原始内容存档于2025-02-11）.

[D0k2a-69] Dey, Nolan. Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models. Cerebras. March 28, 2023 [March 28, 2023]. （原始内容存档于March 28, 2023）.

[falcon-70] Abu Dhabi-based TII launches its own version of ChatGPT. tii.ae. [2023-04-03]. （原始内容存档于2023-04-03）.

[Xb1gq-71] Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. 2023-06-01. arXiv:2306.01116  [cs.CL].

[gzTNw-72] tiiuae/falcon-40b · Hugging Face. huggingface.co. 2023-06-09 [2023-06-20]. （原始内容存档于2023-06-02）.

[Wmlcs-73] UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free 互联网档案馆的存檔，存档日期2024-02-08., 31 May 2023

[nGOSu-74] Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon. BloombergGPT: A Large Language Model for Finance. March 30, 2023. arXiv:2303.17564  [cs.LG].

[9WSFw-75] Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. March 19, 2023. arXiv:2303.10845  [cs.CL].

[JiOl8-76] Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew. OpenAssistant Conversations – Democratizing Large Language Model Alignment. 2023-04-14. arXiv:2304.07327  [cs.CL].

[77] Wrobel, Sharon. Tel Aviv startup rolls out new advanced AI language model to rival OpenAI. www.timesofisrael.com. [2023-07-24]. （原始内容存档于2023-07-24）.

[78] Wiggers, Kyle. With Bedrock, Amazon enters the generative AI race. TechCrunch. 2023-04-13 [2023-07-24]. （原始内容存档于2023-07-24）.

[cnbc-20230516-79] 73.0 ^73.1 Elias, Jennifer. Google's newest A.I. model uses nearly five times more text data for training than its predecessor. CNBC. 16 May 2023 [18 May 2023]. （原始内容存档于16 May 2023）.

[pWyLA-80] Introducing PaLM 2. Google. May 10, 2023 [May 18, 2023]. （原始内容存档于May 18, 2023）.

[meta-20230719-81] 75.0 ^75.1 Introducing Llama 2: The Next Generation of Our Open Source Large Language Model. Meta AI. 2023 [2023-07-19]. （原始内容存档于2024-01-05）.

[82] /MODEL_CARD.md at main · meta-llama/llama. GitHub. [2024-05-28]. （原始内容存档于2024-05-28）.

[83] Claude 2. anthropic.com. [12 December 2023]. （原始内容存档于15 December 2023）.

[84] Nirmal, Dinesh. Building AI for business: IBM's Granite foundation models. IBM Blog. 2023-09-07 [2024-08-11]. （原始内容存档于2024-07-22）（美国英语）.

[mistral-20230927-85] Announcing Mistral 7B. Mistral. 2023 [2023-10-06]. （原始内容存档于2024-01-06）.

[86] Introducing Claude 2.1. anthropic.com. [12 December 2023]. （原始内容存档于15 December 2023）.

[87] xai-org/grok-1, xai-org, 2024-03-19 [2024-03-19], （原始内容存档于2024-05-28）

[88] Grok-1 model card. x.ai. [12 December 2023]. （原始内容存档于2023-11-05）.

[89] Gemini – Google DeepMind. deepmind.google. [12 December 2023]. （原始内容存档于8 December 2023）.

[90] Franzen, Carl. Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance. VentureBeat. 11 December 2023 [12 December 2023]. （原始内容存档于11 December 2023）.

[91] Mixtral of experts. mistral.ai. 11 December 2023 [12 December 2023]. （原始内容存档于13 February 2024）.

[92] AI, Mistral. Cheaper, Better, Faster, Stronger. mistral.ai. 2024-04-17 [2024-05-05]. （原始内容存档于2024-05-05）.

[:1-93] 87.0 ^87.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, 2024-01-05 [2025-02-11], arXiv:2401.02954 , （原始内容存档于2025-03-29）

[:9-94] 88.0 ^88.1 Hughes, Alyssa. Phi-2: The surprising power of small language models. Microsoft Research. 12 December 2023 [13 December 2023]. （原始内容存档于12 December 2023）.

[95] Our next-generation model: Gemini 1.5. Google. 15 February 2024 [16 February 2024]. （原始内容存档于16 February 2024）. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.

[gemma-96] Gemma. [2025-02-11]. （原始内容存档于2024-02-21） –通过GitHub.

[97] Introducing the next generation of Claude. www.anthropic.com. [2024-03-04]. （原始内容存档于2024-03-04）.

[98] Fugaku-LLM/Fugaku-LLM-13B · Hugging Face. huggingface.co. [2024-05-17]. （原始内容存档于2024-05-17）.

[99] Phi-3. azure.microsoft.com. 23 April 2024 [2024-04-28]. （原始内容存档于2024-04-27）.

[100] Phi-3 Model Documentation. huggingface.co. [2024-04-28]. （原始内容存档于2024-05-13）.

[101] Qwen2. GitHub. [2024-06-17]. （原始内容存档于2024-06-17）.

[102] DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, 2024-06-19 [2025-02-11], arXiv:2405.04434 , （原始内容存档于2025-03-30）

[103] vidia/Nemotron-4-340B-Base · Hugging Face. huggingface.co. 2024-06-14 [2024-06-15]. （原始内容存档于2024-06-15）.

[104] Nemotron-4 340B | Research. research.nvidia.com. [2024-06-15]. （原始内容存档于2024-06-15）.

[105] "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta. [2025-02-11]. （原始内容存档于2024-07-24）.

[106] -models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models. GitHub. [2024-07-23]. （原始内容存档于2024-07-23）（英语）.

[107] seek-ai/DeepSeek-V3, DeepSeek, 2024-12-26 [2024-12-26], （原始内容存档于2025-03-27）

[108] Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27 [2024-12-27], （原始内容存档于2025-02-11）

[109] seek-ai/DeepSeek-R1, DeepSeek, 2025-01-21 [2025-01-21], （原始内容存档于2025-02-04）

[110] DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, 2025-01-22 [2025-02-11], arXiv:2501.12948 , （原始内容存档于2025-04-09）

[111] Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng, Qwen2.5 Technical Report, 2025-01-03 [2025-02-11], arXiv:2412.15115 , （原始内容存档于2025-04-01）

[:0-112] 106.0 ^106.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao, MiniMax-01: Scaling Foundation Models with Lightning Attention, 2025-01-14 [2025-01-26], arXiv:2501.08313 , （原始内容存档于2025-03-22）

[113] MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26 [2025-01-26]

[114] Kavukcuoglu, Koray. Gemini 2.0 is now available to everyone. Google. [6 February 2025]. （原始内容存档于2025-04-10）.

[115] Gemini 2.0: Flash, Flash-Lite and Pro. Google for Developers. [6 February 2025]. （原始内容存档于2025-04-10）.

[116] Franzen, Carl. Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search. VentureBeat. 5 February 2025 [6 February 2025]. （原始内容存档于2025-03-17）.

[Mistral_models_overview-117] 111.0 ^111.1 Models Overview. mistral.ai. [2025-03-03].

[118] Grok 3 Beta — The Age of Reasoning Agents. x.ai. [2025-02-22] （英语）.

[119] ta-llama/Llama-4-Maverick-17B-128E · Hugging Face. huggingface.co. 2025-04-05 [2025-04-06].

[120] The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation. ai.meta.com. [2025-04-05]. （原始内容存档于2025-04-05）（英语）.

[121] Team, Qwen. Qwen3: Think Deeper, Act Faster. Qwen. 2025-04-29 [2025-04-29] （英语）.

[122] Whitwam, Ryan. OpenAI announces two "gpt-oss" open AI models, and you can download them today. Ars Technica. 2025-08-05 [2025-08-06] （英语）.

[123] Claude Opus 4.1. www.anthropic.com. [8 August 2025] （英语）.

[124] Introducing GPT-5. openai.com. 7 August 2025 [8 August 2025].

[125] OpenAI Platform: GPT-5 Model Documentation. openai.com. [18 August 2025].

[126] seek-ai/DeepSeek-V3.1 · Hugging Face. huggingface.co. 2025-08-21 [2025-08-25].

[127] DeepSeek-V3.1 Release | DeepSeek API Docs. api-docs.deepseek.com. [2025-08-25] （英语）.

[128] Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell. Zürich: ETH Zürich. 2025-09-02 [2025-11-07] （德语）.

[129] Kirchner, Malte. Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor. heise online. 2025-09-02 [2025-11-07] （德语）.

[130] Introducing Claude Sonnet 4.5. www.anthropic.com. [29 September 2025] （英语）.

[131] Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs. api-docs.deepseek.com. [2025-10-01] （英语）.

[132] seek-ai/DeepSeek-V3.2-Exp · Hugging Face. huggingface.co. 2025-09-29 [2025-10-01].

[133] DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp (PDF). GitHub. [2025-10-01] （英语）.

[134] GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities. z.ai. [2025-10-01] （英语）.

[135] zai-org/GLM-4.6 · Hugging Face. huggingface.co. 2025-09-30 [2025-10-01].

[136] GLM-4.6. modelscope.cn. [2025-10-01].

[137] Kimi K2 Thinking. moonshotai.github.io. [2025-11-06] （英语）.

[138] shotai/Kimi-K2-Thinking · Hugging Face. huggingface.co. 2025-11-06 [2025-11-06].

[139] Kimi-K2-Thinking. modelscope.cn. [2025-11-09].

[140] GPT-5.1 全新上线：更智能、更具对话感的 ChatGPT. openai.com. [2025-11-12] （中文）.

[141] Grok 4.1. x.ai. [2025-11-17] （英语）.

[142] Gemini 3: Introducing the latest Gemini AI model from Google. blog.google. [2025-11-18] （中文）.

[143] Introducing Claude Opus 4.5. anthropic.com. [2025-11-25] （英语）.

[144] DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] （英语）.

[145] DeepSeek V3.2 正式版：强化 Agent 能力，融入思考推理. mp.weixin.qq.com. [2025-12-01] （中文）.

[146] seek-ai/DeepSeek-V3.2 · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01].

[147] DeepSeek-V3.2. modelscope.cn. [2025-12-01].

[148] DeepSeek-V3.2 Release. api-docs.deepseek.com. [2025-12-01] （英语）.

[149] DeepSeek V3.2 正式版：强化 Agent 能力，融入思考推理. mp.weixin.qq.com. [2025-12-01] （中文）.

[150] seek-ai/DeepSeek-V3.2-Speciale · Hugging Face. huggingface.co. 2025-12-01 [2025-12-01].

[151] DeepSeek-V3.2-Speciale. modelscope.cn. [2025-12-01].

[a]

[b]

[c]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[d]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[e]

[f]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

查论编自然语言处理
一般术语	语料库口语语料库停用词词袋人工智慧完全（英语：AI-complete） n元语法（双字母组、三元语法（英语：Trigrams））
文本挖掘	文本分割词性标注（英语：Part-of-speech tagging）拆句处理（英语：Shallow parsing）复合词处理（英语：Compound term processing）搭配提取（英语：Collocation extraction）词干提取词形还原命名实体识别指代文本情感分析概念挖掘（英语：Concept mining）语法分析词义消歧术语提取（英语：Terminology extraction）真实大小写处理（英语：Truecasing）
自动摘要（英语：Automatic summarization）	多文档摘要（英语：Multi-document summarization）句子抽取（英语：Sentence extraction）文本简化（英语：Text simplification）
分佈語義（英语：Distributional semantics）模型	潜在语义学 Seq2Seq模型 Word2vec 語言模型大型语言模型基础模型推理語言模型 LLaMA PaLM ChatGPT GPT-4 文心一言深度求索通義千問 Grok Gemini Copilot 词嵌入
机器翻译	電腦輔助翻譯基于实例（英语：Example-based machine translation）基于规则（英语：Rule-based machine translation）
自动识别与数据采集	语音识别语音合成光学字符识别自然语言生成提示工程
主题模型	弹珠分布（英语：Pachinko allocation）隐含狄利克雷分布潜在语义索引
计算机辅助审查（英语：Computer-assisted reviewing）	自动作文评分（英语：Automated essay scoring）语料库检索工具（英语：Concordancer）文法检查器（英语：Grammar checker）预测文本（英语：Predictive text）拼寫檢查语法猜测（英语：Syntax guessing）
自然语言用户界面（英语：Natural language user interface）	自动在线助手聊天機器人文字冒险游戏問答系統