OpenAI o3

[3] The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the mobile carrier brand named O2.

[7] On February 6, in response to pressure from rivals like DeepSeek, OpenAI announced an update aimed at enhancing the transparency of the thought process in its o3-mini model.

[1] OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online.

[12] On SWE-bench Verified, a software engineering benchmark assessing the ability to solve real GitHub issues, o3 scored 71.7%, compared to 48.9% for o1.

[12] On the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, which evaluates an AI's ability to handle new logical and skill acquisition problems, o3 attained three times the accuracy of o1.