We've trained GPT-3 to be more aligned with what humans want: The new InstructGPT models are better at following human intent than a 100x larger model, while also improving safety and truthfulness. https://t.co/rKNpCDAMb2
— OpenAI (@OpenAI) January 27, 2022
We’ve used basically the same technique (which we call RLHF) in the past for text summarization (https://t.co/nrJjX62SsV). “All we’re doing” here is applying it to a much broader range of language tasks that people use GPT-3 for in the API
— Ryan Lowe (@ryan_t_lowe) January 27, 2022
No comments:
Post a Comment