{"id":15503,"date":"2024-05-29T19:25:15","date_gmt":"2024-05-29T19:25:15","guid":{"rendered":"\/?p=15503"},"modified":"2024-05-29T19:32:18","modified_gmt":"2024-05-29T19:32:18","slug":"filed-as-llms-will-always-fail-on-clearly-identifiable-classes-of-problems","status":"publish","type":"post","link":"\/?p=15503","title":{"rendered":"Filed as (LLMs will always fail on clearly identifiable classes of problems)"},"content":{"rendered":"<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"hdil\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"hdil-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"hdil-0-0\"><span data-offset-key=\"hdil-0-0\">You asked a question about a subject that has a large number of fairly consistent copies on the Internet. I know much of the Internet and STEMC. So it is easy to predict where the LLMs fail. OpenAI, Gemini, Grok and CoPilot all fail on harder problems and OpenAI and CoPilot always fail when asked to do problems that involve scientific notation, many unit conversions, division of values in scientific notation, comparison of sizes, and many other simple, but clear tests. I checked. Hundreds of long conversations and tests.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"bq0u2\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"bq0u2-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"bq0u2-0-0\"><span data-offset-key=\"bq0u2-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"7m221\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"7m221-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"7m221-0-0\"><span data-offset-key=\"7m221-0-0\">The best solution is to use LLMs to handle human languages, NOT have them invent methods from examples from the web, require them to use carefully selected set of methods for STEMC problems, standardize the tokens to be real and translatable to all human languages, have them share conversations in global open formats, have them use computer software themselves and not make the humans do it. Standardize extension methods, standardize interfaces, always have feedback and &#8220;report this AI&#8221;. <\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"4drvt\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"4drvt-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4drvt-0-0\"><span data-offset-key=\"4drvt-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"bonm5\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"bonm5-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"bonm5-0-0\"><span data-offset-key=\"bonm5-0-0\">Because LLMs are so badly trained from shallow skims of the Internet, there are many easily identifiable problems they will never get right. But a global effort where many players work together, it is possible. True machine intelligence is possible. it is not that hard now. But each commercial group insisting on their own methods &#8212; not in collaboration with others, not in collaboration and oversight with users &#8212; will not work.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"1i9tq\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"1i9tq-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"1i9tq-0-0\"><span data-offset-key=\"1i9tq-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"2iutk\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"2iutk-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"2iutk-0-0\"><span data-offset-key=\"2iutk-0-0\">Filed as (LLMs will always fail on clearly identifiable classes of problems)<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"fn2pq\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"fn2pq-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"fn2pq-0-0\"><span data-offset-key=\"fn2pq-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"6\" data-rbd-draggable-id=\"e9hl\">\n<div class=\"\" data-block=\"true\" data-editor=\"4jsmq\" data-offset-key=\"e9hl-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"e9hl-0-0\"><span data-offset-key=\"e9hl-0-0\">Richard Collins, The Internet Foundation<\/span><\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>You asked a question about a subject that has a large number of fairly consistent copies on the Internet. I know much of the Internet and STEMC. So it is easy to predict where the LLMs fail. OpenAI, Gemini, Grok and CoPilot all fail on harder problems and OpenAI and CoPilot always fail when asked <br \/><a class=\"read-more-button\" href=\"\/?p=15503\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[73,72,43],"tags":[],"class_list":["post-15503","post","type-post","status-publish","format-standard","hentry","category-all-knowledge","category-all-languages","category-assistive-technologies"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/15503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15503"}],"version-history":[{"count":1,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/15503\/revisions"}],"predecessor-version":[{"id":15504,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/15503\/revisions\/15504"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15503"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}