It’s 87˚ in Hoboken today, and I’m feeling lazy. Went out this morning and took 150+ photos. It was a foggy morning, which always makes for some interesting shots. I suppose I’ve just got to let the mind meander a bit in default mode.
Lazy Fridays
One thing I’ve noticed in that last few weeks is that I don’t feel much like working hard on Fridays. I’m not sure why. I’ve been without a day job for a long long time, so my week isn’t disciplined into five workdays and two days off on the weekend. But the rest of the world doesn’t work like that and, of course, when I was young, I lived my life to that schedule.
Anyhow, though I’m in a creative phase and getting a lot done, I never seem to manage more than a half day of Friday. Which is fine. I’m not worrying about it, just curious.
Gärdenfors and relational nets
I’ve just now entered into email correspondence with Peter Gärdenfors, a cognitive scientist in Sweden whose been doing some very interesting and, I believe, important work in semantics and cognition. This is an opportune time since his work bears on my current project, which is my primer on relational networks over attractor basins. Yeah, I know, that’s a lot of jargon. It can’t be helped.
That project is moving along, perhaps not as rapidly as I’d hoped. But I like where it’s going.
Arithmetic, writing, and machine learning
I’ve had another thought in my ongoing thinking about why these large language models (LLMs), such as GPT-3, are so bad at arithmetic. As I’ve argued, arithmetic calculation involves a start and stop style of thinking that seems to be difficult, perhaps impossible, for these engines to do. They’re trained to read text, that is, to read, and predict the flow of text. If a prediction is correct, weights are adjusted in one way; if it is incorrect, they’re adjusted differently. Either way it’s a straight-ahead process.
Now, writing is, or can be, like that, and so with reading. That is, it is possible to write something by starting and then moving straight ahead without pause or revision until the piece is done. I have done it, though mostly I start and stop, rework, mess around, and so forth. But when it’s all done, it’s a string. And it can be read that way. Of course, there are texts where you may start and stop, reread, and so forth, but you don’t have to read that way.
But arithmetic isn’t like that. Once you get beyond the easiest problems, you have no choice but to start, stop, keep track of intermediate results, move ahead, and so forth. That’s the nature of the beast.
So, the ‘learning’ style for creating LLMs is suited to the linear nature of writing and reading. But arithmetic is different. What I’m wondering is whether or not this is inherent in the architecture. If so, then there are things, important things, beyond the capability of such an architecture.
I note that OpenAI has come up with a scheme which helps LLMs with arithmetic, but those verifiers strike me as being a work-around that leaves the fundamental problem untouched. Do we really want to this? I don’t see any practical reason for LLMs to be doing arithmetic, so why hobble them with such a work-around? Just to prove it can be done? Is that wise, to ignore the fundamental problem in favor of patches?
Addendum, 5.22.22: Remember, arithmetic isn't just/mere calculating. It's the foundation of Rank 3 thinking. It's where we got the idea of how a finite number of symbols can produce an infinite result; it's the center of the metaphor of the clockwork universe.
As for these very large language models
And so forth. It seems to me that we’re heading for a world where it’s going to take a huge collective effort to create really powerful and versatile artificial minds. Heck, we’re in that world now. I don’t believe, for example, that LLMs can compute their way around the fact that they lack embodiment. As Eric Jang has noted, reality has a shit-ton of detail (not his term). What if embodiment is the only way it can be gathered into machine form?
That’s one reason he’s signed up with a robotics company, Halodi. They’re going to have robots all over the place, interacting with the physical world. They can harvest the detail.
But still, that’s a lot of robots and a lot of detail? Can one company do it all? Whether or not they can, surely others are or will be playing the same game.
It seems to me that one result of all this effort is going to have to be a public utility of some kind, something everyone can access on some easy basis. Maybe several such utilities, no? And how is it all going to be coordinated? Do we have the legal infrastructure necessary for such a job? I doubt it.
More later.
No comments:
Post a Comment