OpenAI has rolled out a new series of AI models designed to spend more time thinking before they respond. An official statement by OpenAI opined, they can reason through complex tasks and solve harder problems than previous models in science, coding, and math.  

OpenAI released the first of this series in ChatGPT and their API. They also mentioned that it is just a preview and is expected to undergo regular updates and improvements. Alongside this release, they also included evaluations for the upcoming development update.  

How it works  

“We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognise their mistakes,” OpenAI said in their post.  

In the tests conducted, the next model update performed similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. They also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions, noted OpenAI.  

As an early model, it doesn’t yet have many features that make ChatGPT useful, like browsing the web for information and uploading files and images. However, the statement added that GPT-4o will be more capable in the near term for many common cases.  

According to OpenAI, this is a significant advancement for complex reasoning tasks and represents a new level of AI capability. “Given this, we are resetting the counter back to 1 and naming this series OpenAI o1,” they added.  

Safety  

As part of developing these new models, OpenAI developed a new safety training approach that harnesses their reasoning capabilities to make them adhere to safety and alignment guidelines. By being able to reason about their safety rules in context, it can apply them more effectively.   

One way OpenAI measures safety is by testing how well its model continues to follow its safety rules if a user tries to bypass them (known as “jailbreaking”). “On one of our hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0-100) while our o1-preview model scored 84,” they opined.  

OpenAI further emphasised, “To match the new capabilities of these models, we’ve bolstered our safety work, internal governance, and federal government collaboration. This includes rigorous testing and evaluations using our Preparedness Framework(opens in a new window), best-in-class red teaming, and board-level review processes, including by our Safety & Security Committee.”  

In order to advance OpenAI’s commitment to AI safety, they recently formalised agreements with the U.S. and U.K. AI Safety Institutes. The company noted, “We’ve begun operationalising these agreements, including granting the institutes early access to a research version of this model. This was an important first step in our partnership, helping to establish a process for research, evaluation, and testing of future models before and following their public release.”  

Whom it’s for  

The company claims that these enhanced reasoning capabilities may be particularly useful for people tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.   

OpenAI o1-mini  

The company argued that, o1 series excels at accurately generating and debugging complex code. "To offer a more efficient solution for developers, we’re also releasing OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding." "As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge," they added.   

How to use OpenAI o1  

ChatGPT Plus and Team users can access o1 models in ChatGPT. Both o1-preview and o1-mini can be selected manually in the model picker, and at launch, weekly rate limits will be 30 messages for o1-preview and 50 for o1-mini. To get started, check out the API documentation(opens in a new window). “We also are planning to bring o1-mini access to all ChatGPT Free users,” they added.  

Sources of Article

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE