NEW SAVANNA: DeepSeek's approach only works in limited technical domains

Sunday, February 2, 2025

DeepSeek's approach only works in limited technical domains

If you look at their excellent paper & code, the reward model is a logical function that was handcrafted & progammed by engineers.

DeepSeek RL approach is impressive in the sense that it reduces the need for tedious supervised fine tuning (SFT) but isn't really general.
— Chomba Bupe (@ChombaBupe) February 1, 2025

No comments:

Post a Comment