We've trained a system to answer grade-school math problems with double the accuracy of a fine-tuned GPT-3 model.— OpenAI (@OpenAI) October 29, 2021
Multistep reasoning is difficult for today's language models. We present a new technique to help. https://t.co/JRXUYZOSg7
The opening to the linked article:
Large language models like GPT-3 have many impressive skills, including their ability to imitate many writing styles, and their extensive factual knowledge. However, they struggle to perform tasks that require accurate multistep reasoning, like solving grade school math word problems. Although the model can mimic the cadence of correct solutions, it regularly produces critical errors in logic.
To match human performance in complex logical domains, our models must learn to recognize their mistakes and to choose their steps carefully. To that end, we train verifiers to evaluate whether or not a proposed solution is correct. To solve a new problem, we use verifiers to select the best among many proposed solutions. We collected the new GSM8K dataset to evaluate our methods, and we are releasing this dataset to facilitate research.
In the ten examples below, we show solutions generated by our new method, verification, and our baseline method, fine-tuning.