Behnam Mohammadi, Creativity Has Left the Chat: The Price of Debiasing Language Models, arXiv:2406.05587 [cs.CL]
Abstract: Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Feedback (RLHF) reduce these issues, their impact on creativity, defined as syntactic and semantic diversity, remains unexplored. We investigate the unintended consequences of RLHF on the creativity of LLMs through three experiments focusing on the Llama-2 series. Our findings reveal that aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards "attractor states", indicating limited output diversity. Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation. The trade-off between consistency and creativity in aligned models should be carefully considered when selecting the appropriate model for a given application. We also discuss the importance of prompt engineering in harnessing the creative potential of base models.
From the article:
An intriguing property of the aligned model’s generation clusters in Experiment is that they exhibit behavior similar to attractor states in dynamical systems. We demonstrate this by intentionally perturbing the model’s generation trajectory, effectively nudging it away from its usual output distribution. Surprisingly, the aligned model gracefully finds its way back to its own attractor state and in-distribution response. The presence of these attractor states in the aligned model’s output space is a phenomenon related to the concept of mode collapse in reinforcement learning, where the model over-optimizes for certain outputs, limiting its exploration of alternative solutions. This behavior contrasts with the base model, which exhibits greater flexibility and adaptability in its outputs.
This is particularly interesting (note the highlighted text) in view of recent work, What’s the Magic Word? A Control Theory of Llm Prompting, where LLM behavior is investigated in light of control theory.
No comments:
Post a Comment