Leaked Microsoft paper: only 8B for GPT-4o-mini and 100B for o1-mini?

AI News7mos agorelease AI Sharing Circle

1.3K 00

There has been an ongoing discussion about the parameter sizes of mainstream closed-source LLMs, and in the last 2 days of 2024 an article from Microsoft about theDetection and correction of medical errors in clinical notesconjectureteststandard of referenceThe MEDEC study accidentally and directly missed the scale of their parameters:o1-preview, GPT-4.GPT-4o andClaude 3.5 Sonnet.

Paper address: https://arxiv.org/pdf/2412.19260v1

微软论文泄露：GPT-4o-mini只有8B，o1-mini仅100B？

The experimental part of the experiment also divides the large model parameter scales into 3 blocks:7-8B, ~100-300B, ~1.7Tbut (not)GPT-4o-miniBeing placed in the first slot with only 8B is a bit unbelievable.

summarize

微软论文泄露：GPT-4o-mini只有8B，o1-mini仅100B？

Claude 3.5 Sonnet (2024-10-22), ~175B
ChatGPT, ~175B
GPT-4, approximately 1.76T
GPT-4o, ~200B
GPT-4o-mini (gpt-4o-2024-05-13) only 8B
Latest o1-mini (o1-mini-2024-09-12) only 100B
o1-preview (o1-preview-2024-09-12) ~300B

© Copyright notes

The article is copyrighted and should not be reproduced without permission.

Related posts

o1 不是聊天模型，教你如何正确的激发o1能力

The o1 is not a chat model and teaches you how to properly energize o1 capabilities

7mos ago

01.6K

flowith 2.0 终于要来了，招募 100 位核心用户提前体验公测

flowith 2.0 is finally coming, recruiting 100 core users to experience the public beta in advance!

7mos ago

01.7K

Cursor 平台模型对比：DeepSeek V3/R1 对战 Claude 3.5 Sonnet 实测

Cursor Platform Model Comparison: DeepSeek V3/R1 vs Claude 3.5 Sonnet Tested

6mos ago

01.7K

Qwen2.5-VL 发布：支持长视频理解、视觉定位、结构化输出，开源可微调

Qwen2.5-VL Released: Supports Long Video Understanding, Visual Localization, Structured Output, Open Source Fine-tunable

6mos ago

02.5K

No comments

You must be logged in to leave a comment!

Login immediately

none

No comments...