Large Language Models are Zero-Shot Reasoners
— Aran Komatsuzaki (@arankomatsuzaki) May 25, 2022
Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.https://t.co/ebvxSbac1K pic.twitter.com/lpZwDTf06m
Abstract from article linked above:
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
See my recent post, Arithmetic and Machine Learning, Part 2, and the arithmetic section of my ramble, Lazy Fridays, Peter Gärdenfors, RNA primer, arithmetic, about these hugely large language models.
Addendum 6.1.22: The folks at Eluthra AI are doing some interesting stuff, A Preliminary Exploration into Factored Cognition with Language Models. Not sure how effectively it can deal with this issue, but they're thinking about it.
Addendum 6.9.22:
🤣🤣🤣 from new report from @ErnestSDavis, responding in part to anecdotal data from @plinz
— Gary Marcus 🇺🇦 (@GaryMarcus) June 9, 2022
Contrary to popular belief, AI Prompt Whisperer is probably not a profession with a future https://t.co/1Z0FirqtUx pic.twitter.com/LP9wb8EG93
No comments:
Post a Comment