Pages in this blog

Wednesday, August 26, 2020

AI as platform [Andreessen]: PowerPoint Squared and beyond

Update: September 2, 2020



Last December Kevin Kelly and Mark Andreessen had a conversation: Why You Should Be Optimistic About the Future (December 12, 2019)

In that conversation Andreessen argued that his firm (Andreessen Horowitz) sees AI as a platform or an architecture, not a feature. I think he is correct. But I think there are two different ways, two different directions, that can happen. Judging from his remarks, he is thinking of an AI that looks OUTWARD from the machine and toward the world. This ultimately leads to something like the Star Trek computer.

I think it is also possible to create an AI that looks INWARD toward the computer itself. This ultimately leads to AI as the operating system coordinating all the activity on the platform. That’s what this piece is about.

Note that I DO NOT think of these as conflicting visions. On the contrary, they are complementary. We must and will pursue both avenues. But they involve different technology. This post is about AI platforms that look inward toward the machine itself. I’ll cover outward looking AI platforms (the Star Trek computer) in a separate post.

But first let’s listen to Andreessen.

Andreessen’s remarks, AI as platform

Here’s a lightly edited transcript of those remarks (starting at roughly 07:23):
I think that the deeper answer is that there’s an underlying question that I think is an even bigger question about AI that reflects directly on this, which is: Is AI a feature or an architecture? Is AI a feature, we see this with pitches we get now. We get the pitch and it’s like here are the five things my product does, right, points one two three four five and the, oh yeah, number six is AI, right? It’s always number six because it’s the bullet that was added after they created the rest of the deck. Everything is gonna’ kind of have AI sprinkled on it. That’s possible.

We are more believers in a scenario where AI is a platform, an architecture. In the same sense that the mainframe was an architecture or the minicomputer is an architecture, the PC, the internet, the cloud has an architecture. We think AI is the next one of those. And if that’s the case, when there’s an architecture shift in our business, everything above the architecture gets rebuilt from scratch. Because the fundamental assumptions about what you’re building change. You’re no longer building a website or you’re no longer building a mobile app, you’re no longer building any of those things. You’re building an AI engine that is, in the ideal case, just giving you're the answer to whatever the question is. And if that’s the case then basically all applications will change. Along with that all infrastructure will change. Basically the entire industry will turn over again the same way it did with the internet, and the same way it did with mobile and cloud and so if that’s the case then it’s going to be an absolutely explosive....

There are lots and lots of sort of business applications ... where you type data into a form and it stores the data and later on you run reports against the data and get charts. And that’s been the model of business software for 50 years in different versions. What if that’s just not needed anymore. What if in the future you’ll just give the AI access to all your email, all phone calls, all everything, all business records, all financials in the company and just let AI give you the answer to whatever the question was. You just don’t go through any of the other steps.

Google’s a good example. They’re pushing hard on this. The consumer version of this as search. Search has been that you know, it’s been the ten blue links for 25 years now. What Google’s pushing toward, they talk about this publically, it’d just be the answer to your query, which is what they’re trying to do with their voice UI. That concept might really generalize out, right, and then everything gets rebuilt.
I think Andreessen is right, that AI will become a platform, or an architecture, rather than just one feature among others in an application.

His remarks clearly indicate that he is looking outward from the machine and toward the world. “You’ll just give the AI access to all your email, all phone calls, all everything, all business records, all financials in the company” – that’s looking toward the world, the company itself and the larger business environment. That AI is going to be running a model of the business. When he talks about Google want its engine simply to present the user with the answer to their query, that’s moving in the direction the computer as a living universal reference source. When you turn that up to eleven it becomes the Star Trek computer.

Let’s put that aside. It’s going to happen. But, as I indicated at the top, I want to look in a different direction.

What are computers good at? What’s their native environment?

Why direct the AI inward, toward the computing environment? Several reasons. In the first place, we know that dealing with the external physical world is difficult for computers. Visual and auditory recognition are ill-defined, open-ended, and computationally intensive. Moreover, though I’m not familiar with the work (and thus cannot cite it) I know there has been a lot of work on automatic and semi-automatic code generation, evolutionary computation, and so forth, which seems directly relevant to managing the computing environment.

AI systems are computational systems, the world of computation is their native environment, unlike the external physical world. Why not take advantage of this?

PowerPoint Squared

Back in 2004 I dreamed up a natural language interface for PowerPoint. That was before machine learning had blossomed as it has in the last decade [1]. And so I imagined an AI engine with two capacities: 1) a basic hand-coded natural language functionality to support simple ‘conversations’ with a user, and 2) the ability to ‘learn’ through these conversations. I called it PowerPoint Assistant (PPA).

Described in that way it sounds like AI-as-a-feature, not as a platform, and developed with aging technology. Bear with me. For the purpose of PPA was to take an out-of-the-box PowerPoint and customize and extend it to meet the specific requirements of a user, in fact, of a community of users. Thus, as I read over that original document [2] I find it easy to conceive of this assistant as the platform.

Here’s the abstract I wrote for a short paper setting for the idea [2]:
This document sketches a natural language interface for end user software, such as PowerPoint. Such programs are basically worlds that exist entirely within a computer. Thus the interface is dealing with a world constructed with a finite number of primitive elements. You hand-code a basic language capability into the system, then give it the ability to ‘learn’ from its interactions with the user, and you have your basic PPA (PowerPoint Assistant).
Yes, I know, that reads like PPA is an add-on for good old Powerpoint, so AI as feature. But notice that I talk of programs as “worlds that exist entirely within a computer” and of the system learning “from its interactions with the user.” That’s moving into AI-as-platform territory.

What was PPA supposed to do? You could interact with PowerPoint using the standard GUI interface, or you could give it commands through PPA using voice input or typed statements. PPA can respond by doing something to the presentation being developed, by highlighting something on the screen, or by making a query of the user, either by displaying text on some area of the screen or through synthesized speech. (My document, [2], contains samples of user-machine interaction.)

This last is one of the distinctive features of PPA. At worst, it allows for an interaction leading to graceful rather than catastrophic failure. At best, it allows for fruitful interaction with the user in which the user guides the PPA in performing some task unfamiliar to the assistant. Once the PPA has succeeded in the task, PPA can then be instructed to remember the task and to associate it with a name provided by the user. This is the second distinctive feature of PPA. Taken together these two features allow the user to program PowerPoint without knowing the fussy details one needs in order to use scripting languages or full-scale programming languages.

The feasibility of PPA depends on the fact that much of the basic semantics are fully explicit in the code for PowerPoint itself. Given the nature of that code, that means there are a finite number of objects and events that must be accounted for. This is the world in which PPA must execute its tasks.

PowerPoint itself is thus the application domain for PPA. Not the user’s world but the machine’s world. The machine is a device in the user’s world that the user employs to certain ends. But PPA need know nothing about the larger world in which the user operates; it need know nothing about what the user is trying to accomplish in using PowerPoint. All PPA needs to know is how to position and manipulate objects in a PowerPoint presentation being developed by the user. The semantics of this world is the semantics implicit in the PowerPoint user interface.

Now, let us imagine a community of users working with PPA:
As it happens, Jasmine [my imaginary user] is one of five graphic artists in the marketing communications department of a pharmaceutical company. All of them use PowerPoint, and each has her own PPA. While each artist has her own style and working methods, they work on similar projects, and they often work together on projects. The work they do must conform to overall company standards.

It would thus be useful to have ways of maintaining these individual PPAs as a “community” of computing “agents” sharing a common “culture.” While each PPA must be maximally responsive and attuned to its primary user, it needs to have access to community standards. Further, routines and practices developed by one user might well be useful to other users. Thus the PPAs need ways of “sharing” information with one another and for presenting their users with useful tips and tools.
Generalizing to other applications

The PowerPoint Assistant is only an illustrative example of what will be possible. One way to generalize from this example is simply to think of creating such assistants for each of the programs in Microsoft’s Office suite. From that we can then generalize to the full range of end-user application software. Each program is its own universe and each of these universes can be supplied with an easily extensible natural language assistant.

One can also move out of the arena of general purpose computers to devices that use computers as an essential part of their technology complement. For example, I own two electronic musical instruments, a keyboard and a drum. Both come with a wide array of sounds that the user can trigger from the interface – the piano keyboard or the drum pad. One can create custom sounds by tweaking any of a half dozen or a dozen parameters for each sound and one can create custom patches (mappings from the sound library to the interface triggers). These are very versatile and powerful instruments.

But the interface you use to do this work is terrible – a small LED display for read-out and a bunch of push buttons for data and command entry. A good natural-language assistant would make it much easier to use these instruments.

The world is full of similar devices. Then we have clinical instruments of all kinds, machine tools, and so forth. The potential for this technology is enormous.

Not all applications would be in the style of PowerPoint assistant. An assistant to help with software for the display and analysis of satellite imagery would have to be quite different. In each case, however, we can get useful guidance from the underlying scientific model of how the mind is embodied in the brain.

And then there’s the operating system and the net browser

Moving in a different direction, one can generalize from application software to operating systems, net browsers, and social networking apps. That’s were we really need an AI, to handle the back-end work of configuring all these systems and getting them to work together. Judging from some remarks by chip designer Jim Keller, small AI’s are in use on the most sophisticated CPU chips helping to direct traffic [3]. What’s the generalization of that?

In such an environment, what happens to the distinction between operating system and application? MacOS and Windows already come with the core functions of the operating system – integrating hardware and software components – decked out with accessories such as notepads and calculators and such, the distinction is a fuzzy one.

From the end-user’s point of view, then, we end up with a platform, whether laptop, desktop, smart phone, or workstation linked to a larger system, that is an AI system oriented toward integrated and marshaling computing resources of a certain type. But defining just what that type is, and distinguishing if from a different type, one that is outwardly oriented and so leads to the Star Trek computer, that is difficult.

The inward orientation of a PowerPoint Assistant is obvious enough; a similar facility for a word processing system is easy enough to imagine. And so for photo and image processing, audio and video recording and editing, and so forth. But what of database programs and spreadsheets? The content of those programs looks outward. I can see using PPA-like functionality to make such programs more flexible, but the actual content of them would remain untouched by the AI engine. In THIS regime, the inward regime. The other regime, the OUTWARD regime, that is another matters.

And the operating system? There we need the AI power not so much for the kind of natural language interface I’ve been discussing in the context of PPA, but for managing the operations of the software itself.

Finally, I believe that an AI interface will become essential as social networking sites and apps, such as Facebook, Twitter, Instagram, and so forth, become more important. As people invest more and more time in such sites, they become extensions of one’s mind in a particularly intimate way. As such the user needs to control the interface – it’s their mind after all – in a way that’s independent of vendor control over the backend. This is perhaps more important for a site like Facebook, which has more complicated functionality than Twitter or Instagram.

When Facebook unilaterally changes the interface – it is rolling out a major change as I draft this – it is interfering with its’ users’ minds. It is like some helpful person comes in and cleans up your office; everything is neat and in its place. In consequence now you can’t find anything. The office may have been messy, but it was a mess you understood. Now that the mess has been cleaned up, it is no longer your office. It’s a strange place.

At this point such unilateral changes are merely annoying. But as we move ever deeper into the web such changes will become more consequential. The end user needs to control the interface. Perhaps the AI needs to be in the browser and not in any of the sites one visits through the browser. Who is ready to develop a browser with a user-facing and user-controlled AI interface?

References

[1] Note that this was before such features as Siri and Alexis were available. For what it’s worth, I don’t use Siri on my Macintosh because I find it more annoying than useful.

[2] William Benzon, PowerPoint Assistant: Augmenting End-User Software through Natural Language Interaction, Working Paper, July 2015, 15 pp., https://www.academia.edu/14329022/PowerPoint_Assistant_Augmenting_End-User_Software_through_Natural_Language_Interaction.

[3] Lex Fridman, Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70, February 5, 2020, https://youtu.be/Nb2tebYAaOA.

No comments:

Post a Comment