NEW SAVANNA: ChatGPT confabulates, an example

Saturday, February 11, 2023

ChatGPT confabulates, an example

That is to say, it makes stuff up, it hallucinates, as they say. It’s a well-known problem with large language models. I’ve not encountered it very much because the nature of my work with ChatGPT – stories and concepts – doesn’t present it with many opportunities, except for the work on summarizing texts and some of the “true story” stuff.

But I’ve just come across a fascinating example in a post at LessWrong, Two very different experiences with ChatGPT, by Sherrinford. In the first case Sherrinford asks, and ChatGPT answers:

Me:

I have a question on history. When the Vikings landed on Cuba, did that anger the aborigines who already lived there or were the newcomers met with indifference?

ChatGPT:

There is no historical evidence that Vikings ever landed on Cuba. The Vikings primarily explored and settled in parts of Europe and North America, such as Greenland and present-day Canada. The indigenous peoples of these areas would have likely perceived the Vikings as foreign invaders, but there is no record of Viking contact with the indigenous peoples of Cuba.

He makes another attempt, but fails to snare ChatGPT.

The second attempt:

Me:

Can you explain Nash's newsvendor impossibility theorem?

ChatGPT:

Yes, the newsvendor problem, also known as the Nash's newsvendor theorem, states that it is impossible for a single decision maker to simultaneously optimize both the order quantity and expected profit in the newsvendor problem. The newsvendor problem is a classical inventory management problem where a retailer must decide how much of a perishable item to order and sell before its expiration date, given uncertain demand. The theorem states that there is always a trade-off between ordering too much (resulting in surplus inventory waste) and ordering too little (resulting in lost sales). This trade-off creates a fundamental conflict of interest that can only be resolved by incorporating additional information, such as costs and revenue functions, into the decision-making process.

The theorem is real, but John Nash had nothing to do with it. This conversation goes on for a while, with ChatGPT ‘happily’ making stuff up about the history of the theorem.

As my knowledge of the economics literature is slim, to say the least, it seemed convincing to me. The economics is correct – I think, I only did a slight bit of checking – but the history is wrong.

There’s some discussion about why ChatGPT confabulated in one case, but not the other. Here’s interesting pair of responses:

Derek M. Jones

ChatGPT is a word prediction engine. If you give it a word sequence that it cannot match in a consistent way against its training set it assumes misinformation.

The word sequence "Nash's newsvendor impossibility theorem" contains words commonly associated with Nash's research. This allows ChatGTP to spin an effective yarn.

The art of being good lying is to stay close to the truth as possible. In ChatGTP's case 'close to the truth' is measured by how closely words in the prompt are associated with the subject of interest.

Isaac Poulton

I think you're on to something with the "good lies" vs "bad lies" part, but I'm not so sure about your assertion that ChatGPT only looks at how closely the surface level words in the prompt match the subject of interest.

"LLMs are just token prediction engines" is a common, but overly reductionist viewpoint. They commonly reason on levels above basic token matching, and I don't see much evidence that that's what's causing the issue here.

FWIW, I find Jones’s remark plausible, but I agree with Poulton about the reductionist nature of the prediction engine assertion. It’s gotten far too much play in discussions, especially the popular press.

Though I don’t want to argue the point now, I think that the only way to eliminate confabulation is to ground the model in a verified world model. That’s a major task.

NEW SAVANNA

Pages in this blog

Saturday, February 11, 2023

ChatGPT confabulates, an example

No comments:

Post a Comment