OpenAI introduces a new series of inference models built to solve tough problems. It will be officially launched starting September 12th.
We have developed a new range of AI models that take extra time to think before giving an answer. They can handle complex tasks and solve more challenging problems than previous models in areas such as science, programming, and math.
Today, we are in the ChatGPT and our API has released the first model in this series. This is a preview version and we plan to make regular updates and improvements. In the meantime, we've made theResults of the next updated assessmentThe update is currently under development.
Working Principle
We train these models to think more deeply before answering questions, just like humans do. Through training, they learn to refine their thought processes, try different strategies, and recognize their mistakes.
In our tests, the ready-to-launch model update performed at a PhD level on challenging benchmark tasks in physics, chemistry, and biology. We also found it excelled in math and programming. On the International Mathematical Olympiad (IMO) qualifying exam, GPT-4o only solved 13% problems correctly, while the new inference model achieved 83%. In terms of programming ability, they reached the former 89% level of performance in the Codeforces competition. More details can be found in ourTechnical Research ArticlesThe
As an early model, it currently lacks many of the useful features of ChatGPT, such as web browsing and file image uploading. In the short term, for many common scenarios, GPT-4o may be more practical.
However, for complex reasoning tasks, this is a major breakthrough and represents a new level of AI capability. Based on this, we reset the counter to 1 and named the series OpenAI o Security
In developing these new models, we propose a new approach to safety training that fully utilizes their reasoning capabilities for better compliance with safety and alignment guidelines. By being able to reason about our safety rules in specific contexts, it allows for more effective application of these rules.
One way we measure security is by testing the model's ability to continue to comply with security rules in the face of a user's attempt to bypass them (commonly known as a "jailbreak"). In our most challenging jailbreak test, GPT-4o scored 22 out of 100, while our o1-preview model scored a whopping 84. More details can be found atSystem Descriptionand ourResearch ArticlesThe
To match the new capabilities of these models, we have enhanced our security efforts, internal governance, and collaboration with the federal government. This includes using ourPreparation frameworkConducting rigorous testing and evaluation, top-notch red team testing, and a board-level review process that includes the involvement of our Safety and Security Committee.
In furtherance of our commitment to AI security, we recently entered into formal agreements with the AI Security Institutes in the United States and the United Kingdom. We have begun to implement these agreements, including granting these institutes early access to research versions of the model. This is an important first step in our partnership to help establish a process for researching, evaluating, and testing future models before and after public release.
population (esp. of a group of people)
These enhanced reasoning capabilities are particularly well suited for those working on complex problems in science, programming, math, and other fields. For example, medical researchers can use o1 to annotate cell sequencing data, physicists can use it to generate the complex mathematical formulas needed for quantum optics, and developers in a variety of fields can use it to build and execute multi-step workflows.
OpenAI o1-mini
The o1 family of models excels in generating and debugging complex code. In order to provide developers with a more efficient solution, we have introduced the OpenAI o1-mini. It is a faster, more economical reasoning model that is particularly good at programming tasks. As a smaller scale model, o1-mini costs 80% less than o1-preview, making it ideal for applications that require reasoning power but not extensive world knowledge, making it both powerful and affordable.
How to use OpenAI o1
Starting today.ChatGPT Plus and Team usersIt is possible to use the o1 model in ChatGPT. Users can manually select o1-preview and o1-mini in the model selector. o1-preview is limited to 30 messages per week and o1-mini to 50 messages per week in the initial release. We are working on increasing these limits and developing the ability for ChatGPT to automatically select the most appropriate model for each prompt.
ChatGPT Enterprise and Edu userswill get access to both models starting next week. Conforms to the API Usage Level 5 (opens in new window) (used form a nominal expression)developersYou can start prototyping with both models in the API now, with a current rate limit of 20 requests per minute. We are conducting additional testing to increase these limits. Currently, the APIs for these models do not include features such as function calls, streaming, and system message support. To get started, check out the API documentation (opens in new window)The
We also plan to have all ChatGPT Free UsersBoth can use the o1-mini.
future outlook
This is just an early preview of these inference models in ChatGPT and the API. In addition to continually updating the models, I'm adding web browsing, file and image uploads, and other features to enhance their usefulness.
In addition to the new OpenAI o1 family, we will continue to develop and release the GPT family of models.