All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet
https://x.com/yuntiandeng/status/1836114401213989366 Yuntian Deng @yuntiandeng Is OpenAI’s o1 a good calculator? We tested it on up to 20×20 multiplication—o1 solves up to 9×9 multiplication with decent accuracy, while gpt-4o struggles beyond 4×4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 https://pic.x.com/et5db9bhnl Replying to @yuntiandeng All AIs fail
Read More »