All four quarterfinal matches on the morning of 6/8 ended with a score of 4-0. The most lopsided victory belonged to o3 against Kimi K2. All four games concluded in under eight moves due to Kimi K2 making illegal moves.
o3 is the LLM powering the well-known chatbot ChatGPT, developed by the American company OpenAI. Kimi K2 is a model from the China-based company Moonshot AI.
In the third game, for instance, o3, playing black, moved its queen to e5 on the eighth move, placing the white king in check. Kimi K2 correctly perceived the positions of all the pieces and recognized the need to either move its king or block the check. However, in all four attempts, it made illegal moves, resulting in a loss for the Chinese AI.
![]() |
The board position after the move 8...Qxe5. At this point, the Chinese AI (playing white) made four consecutive illegal moves: 9.Kf2, 9.Ke2, 9.Nxe5, and 9.Nxe5. In this position, white only has six legal moves: moving the king to d2 or placing the queen, bishop, or knight on the e-file to block the check. (Image source or author) |
The board position after the move 8...Qxe5. At this point, the Chinese AI (playing white) made four consecutive illegal moves: 9.Kf2, 9.Ke2, 9.Nxe5, and 9.Nxe5. In this position, white only has six legal moves: moving the king to d2 or placing the queen, bishop, or knight on the e-file to block the check. (Image source or author)
o3's move accuracy in this match, compared to the chess engine Stockfish, was 96.5%, 95.1%, 100%, and 100%, respectively. However, this statistic is not particularly meaningful given Kimi K2's swift defeat.
The first match lasted 28 moves, ending when Deepseek made its fourth illegal move. In the second match, o4-mini checkmated its opponent after only 17 moves. The remaining two matches concluded after 25 and 26 moves, with the US representative delivering checkmate in the final game.
The two models behind ChatGPT shone, but the most impressive performance of the quarterfinals belonged to Grok 4, the model from billionaire Elon Musk's company xAI. Grok 4 can be used for chatbot-like interactions on X (formerly Twitter) by paid subscribers. Its defeated opponent was Google's Gemini 2.5 Flash.
Musk's model won all four games, punishing every mistake by its opponent, such as losing a piece for nothing. Grok 4's move accuracy was 77.8%, 97.5%, 94.4%, and 94.8%, respectively, the highest of the round, excluding the match between o3 and Kimi K2.
Grok 4 (xAI) defeats Gemini 2.5 Flash (Google). (Video source or author)
While commentating on the match, world number two chess player Hikaru Nakamura was impressed by Grok 4's moves. "Grok 4 is definitely the strongest chess-playing LLM in this tournament. The skill gap between it and the other models is not small," he said. "Musk will definitely boast about this win on Twitter."
Musk later re-shared a post with an image of Nakamura commentating online, adding, "This is just a side effect. xAI has spent almost no time on chess."
Despite Gemini 2.5 Flash's elimination, Google still had a representative in the tournament: Gemini 2.5 Pro, which secured a 4-0 victory against Claude 4 Opus. Claude is a model from Anthropic, based in San Francisco, California. It performed reasonably well, lasting around 30 moves in all four games, but Gemini 2.5 Pro was simply stronger.
The semi-finals will take place from 0:30 Thursday, 7/8, Hanoi time. Grok 4 will face Gemini 2.5 Pro. The other semi-final will be an OpenAI derby between o3 and o4-mini.
The tournament, hosted by Google on the Kaggle platform from 5/8 to 7/8, features eight LLMs competing in a single-elimination format to determine the champion.
LLMs are a type of artificial intelligence (AI) model primarily used for language processing, translation, and content creation. Users can interact with LLMs through chatbots, such as OpenAI's ChatGPT. Technically, LLMs operate by "predicting the next word."
Xuan Binh