Friday, September 22, 2023

Better at chess, still sucks at planning [look at the red box]

* * * * *

In related news, Subbarao Kambhampati, Can LLMs Really Reason and Plan? BLOG@CACM, September 12, 2023.

Second paragraph:

Nothing in the training and use of LLMs would seem to suggest remotely that they can do any type of principled reasoning (which, as we know, often involves computationally hard inference/search). While one can dismiss the claims by hype-fueled social media influencers, startup-founders and VC's, it is hard to ignore when there are also peer-reviewed papers in top conferences making similar claims. The "Large Language Models are Zero-Shot " is almost becoming a meme paper title! At some level, this trend is understandable as in the era of LLMs, AI has become a form of ersatz natural science–driven by observational studies of capabilities of these behemoth systems.

In conclusion:

The fact that LLMs are often good at extracting planning knowledge can indeed be gainfully leveraged. As we have argued in our recent work, LLMs can thus be a rich source of approximate models of world/domain dynamics and user preferences,, as long as the humans (and any specialized critics) in the loop verify and refine those models, and give them over to model-based solvers. This way of using LLMs has the advantage that the humans need only be present when the dynamics/preference model is being teased out and refined, and the actual planning after that can be left to planning algorithms with correctness guarantees (modulo the input model). Such a framework has striking similarities to knowledge-based AI systems of yore, with LLMs effectively replacing the "knowledge engineer." Given the rather quixotic and dogmatic shift of AI away from approaches that accept domain knowledge from human experts, something I bemoaned in Polanyi's Revenge, this new trend of using LLMs as knowledge sources can be viewed as a form of avenging Polanyi's revenge! Indeed, LLMs make it easy to get problem-specific knowledge as long as we are willing to relax correctness requirements of that knowledge. In contrast to the old knowledge engineering approaches, LLMs offer this without making it look like we are inconveniencing any specific human (we are, instead, just leveraging everything humans told each other!). So the million dollar question for reasoning tasks is: "how would you do planning if you have some doddering know-it-all ready to give you any kind of knowledge?" Traditional approaches to model-based reasoning/planning that focus on the incompleteness and incorrectness of the said models (such as model-lite planning, robust planning) can have fresh relevance.

To summarize, nothing that I have read, verified or done gives me any compelling reason to believe that LLMs do reasoning/planning as it is normally understood. What they do, armed with their web-scale training, is a form of universal approximate retrieval which, as we have argued, can sometimes be mistaken for reasoning capabilities. LLMs do excel in idea generation for any task–including those involving reasoning, and as I pointed out, this can be effectively leveraged to support reasoning/planning. In other words, LLMs already have enough amazing approximate retrieval abilities that we can gainfully leverage, that we don't need to ascribe fake reasoning/planning capabilities to them.

There's more at the link.

No comments:

Post a Comment