Chess engines surpassed humans 28 years ago when Deep Blue defeated then-world champion Garry Kasparov (Elo 2795) with a score of 3.5-2.5 in a match in New York in May 1997. Since then, chess software has continuously improved, raising the Elo rating to 3731, as seen with the current tool, Stockfish. Carlsen remains the strongest human player with an Elo rating of 2839.
The moves of Deep Blue back then, and now Stockfish, often puzzle top players like Kasparov and Carlsen. These are specialized software programs designed for playing chess, based on machine algorithms. Therefore, their thinking is not like that of humans.
![]() |
Illustration of Magnus Carlsen (right) playing chess with a robot. Photo: Grok |
At a recent AI-only chess tournament hosted by Google, Carlsen noticed that LLMs play chess more like humans. Their skill level may be below 2000 Elo, and they occasionally blunder pieces. However, the Norwegian grandmaster stated that he doesn't want to criticize the AI's mistakes.
"Following the reasoning process of the AIs at the Google chess tournament was fascinating," Carlsen said on the Take Take Take channel. "Their mistakes and the interpretations that led to those mistakes were both humorous and educational. I looked at their moves and thought they were all good ideas, logical, even if the move wasn't exactly correct."
Carlsen cited an example from the final game of the tournament, where ChatGPT o3 (a product of OpenAI) defeated Grok 4 (xAI). Playing white, ChatGPT pushed the e-pawn to e5 on move 11, which Stockfish evaluated as a serious blunder, turning a drawn position into a losing one.
![]() |
The board after move 11.e5. White opens the diagonal for the queen to attack the rook on a8, while allowing the black pawn on e5 to attack the black knight. |
![]() |
However, Grok responded with bishop to b7, attacking the queen and rook simultaneously. White will have to lose one of these two pieces. |
Carlsen praised ChatGPT's idea during this phase. "If we consider LLMs to be relatively new chess players, we need to encourage such good ideas," he said. "This move may be very bad because Black responds with bishop to b7, but I don't mind at all. By the way, I'm also impressed that Grok found the bishop to b7 move."
The Norwegian grandmaster believes that the AI's mistakes are similar to those humans often make when playing chess. Another example Carlsen gave was on move 18, when ChatGPT had two options.
![]() |
The board after move 17...Ke6. The white knight has just checked, forcing the black king to move, after which it can capture the black queen with 18.Nxd6. But Stockfish suggests that the best move is rook to e1, to check. Then, if the black king moves to f6, the knight will capture the queen without sacrificing itself. And if Grok moves the knight to e5 to block the check, that knight will be captured by the rook, which will also capture the queen. |
Stockfish's approach is not difficult for grandmaster-level players and above to see. But at a sub-2000 Elo level, like the LLMs, capturing the queen on d6 immediately with the knight on move 18 is a tempting option.
"The move 18.Nxd6 is really bad, but I'm not too harsh," Carlsen said. "The rook check Rhe1 is called an intermediate move, a concept quite difficult for the AI's level. Even human players take some time to get used to such moves. I'm also pleased that Grok moved its king to protect the queen in the previous move."
Grandmaster David Howell, while commentating with Carlsen, also agreed that the knight capturing the queen on move 18 is very human-like. "New chess players or even those trained in clubs make similar mistakes," Howell laughed and said. "See an undefended queen, grab it. We were taught that from a young age."
ChatGPT won the first AI chess tournament, despite previously losing to Carlsen without capturing any pieces. However, the ChatGPT model used in the game against Carlsen might have been o4-mini, a faster and more streamlined tool compared to o3. o4-mini finished the tournament in 4th place, while o3 demonstrated superior reasoning power compared to its competitors. However, it is still not comparable to grandmasters.
AI stands for artificial intelligence. It is the ability of a computer to perform tasks associated with human intelligence. One application of AI is content creation, often performed by LLMs. Technically, LLMs operate by "guessing the next word." Carlsen is surprised that LLMs can guess moves with human-like thinking.
Carlsen also believes that in the next chess tournament, if there is one, AI will perform even better than it does now.
Xuan Binh