About an hour before the final match, OpenAI announced the release of its 11th generation LLM, GPT-5. However, the model ChatGPT used in the final was o3, its most powerful reasoning product. Facing xAI's Grok 4, which performed well in the group stage, o3 demonstrated superior strength with an average move accuracy of 90.8%, compared to Grok 4's 80.2%.
![]() |
ChatGPT in a chess match. Photo: ChatGPT |
ChatGPT in a chess match. Photo: ChatGPT
In all 4 games, ChatGPT checkmated its opponent after 35, 30, 28, and 54 moves, respectively. World number two chess player Hikaru Nakamura suggested that Grok 4 appeared nervous and tense during the match, preventing it from performing at its best, unlike its two previous matches. It frequently lost pieces in the final, which rarely happened in its victories against Google's Gemini 2.5 Flash and Gemini 2.5 Pro.
The o3 model finished the tournament with three 4-0 victories and an average accuracy of 91% over 12 games. While its strength doesn't yet match that of a grandmaster, chess players with online ratings of 2000 or below might find it challenging to compete against o3, especially in blitz or bullet chess.
ChatGPT, developed by the American company OpenAI, is a pioneering chatbot in the AI revolution, launched on 15/1/2023. At that time, it used the GPT-3 model, a versatile model series. The GPT series demonstrates versatility, while the 'o' series focuses on reasoning. o3 was released in 1/2025, and o4-mini followed three months later. These two models also represented OpenAI in the first-ever AI chess tournament.
Grok 4 is a model from xAI, a company owned by Elon Musk. Musk stated that Grok 4 had received almost no chess training before the tournament.
Hosted by Google on the Kaggle platform, the three-day tournament featured 8 AIs competing in a knockout format. The two Chinese representatives, Kimi K4 and Deepseek, were eliminated early in the quarterfinals, suffering heavy defeats. The remaining six AIs in the tournament all belonged to American companies, representing some of the world's most powerful AIs.
In the third-place match, Gemini 2.5 Pro defeated o4-mini with a score of 3.5-0.5.
None of the LLMs in the tournament specialized in chess. Concurrently, another less publicized chess tournament took place between 8 chess engines. These engines had Elo ratings ranging from 3576 (Integral) to 3731 (Stockfish). Some of these engines utilized AI algorithms to enhance their proficiency, all surpassing human capabilities.
Xuan Binh