Not the Prime Minister. Just His Interpreter

As the limits of Large Language Models become more apparent, they're being paired with other tools instead of being allowed to go it alone

Apr 08, 2024

A still from one of Archetype’s demo videos. Obviously I didn’t copy this for the loveliness of the photo. It’s what the system says about it that matters.

A short post this week, for a variety of reasons. One of which is that I am on jury duty for a few days. Could AI be applied in this antiquated system? Should it? (No surprise that some services already try to use machine learning to pick jurors.) Questions I might write about after the case is concluded. For now, though, it's going to reduce the time I have to spend on the blog. I will catch up later this month! Meantime, here’s a bit of news.

A New Entrant in the Race for "Multimodal" AI

ChatGPT and other "large language models" are sort of the Boris Johnson of software -- charming and sometimes knowledgable, but willing to say anything as long as it sounds plausible. We can't expect these models to be accurate or consistent.

So people are looking now for ways LLMs can be teamed with technology that actually does care about reality. Instead of treating ChatGPT and its kin as authorities, these approaches treat language models as interpreters. They become go-betweens between human understanding and that of machines.

A new example of this approach unveiled itself last week: It's called Newton, a "Large Behavior Model," created by a new startup called Archetype. As Steven Levy reported a couple of days ago, Newton uses a language model to go between the human worldview and that of thousands of sensors planted all over, in and around anything we might care about.

The sensors -- cameras, microphones, radars, thermometers, accelerometers and others -- gather data about light, heat, movement and so on. But humans don't want to know how much light coming back from Location 122 differs from the light from 127. They want to know, "what's the deal with that stain?"

Enter the LLM. It does two things to bridge the gap between machine data and human viewpoints.

First, from its training on billions of words, it ferrets out knowledge about the world, to put names to the objects and activities that the sensors detect. That flat thing is probably a wall. That irregularly shaped thing sticking up is plant. The difference in color at this spot suggests water damage.

Second, the model can generate sentences to describe what it perceives, and it can understand what the human says in response.

In broad outline, this is a lot like a couple of projects I wrote about last week, which position a language model between machines and humans. The AI system I use most -- Perplexity.ai -- works this way too. It's a search application that uses a LLM (Anthropic's Claude or OpenAI's GPT-4) to interpret my question for a search algorithm. Instead of basing its answer on the entire Internet, the LLM uses only the search results to tell me what I want to know. It still hallucinates, but it's at least on topic. And anyway, like any sane user, I always check (a step that's made easy by the way Perplexity lists all its sources).

As the ChatGPT backlash rolls along, and you read about how Large Language Models are screwing up, don't assume they're going to disappear. It's much more likely they'll live on as parts of these sorts of "multimodal" systems.

Robots for the Rest of Us

Not the Prime Minister. Just His Interpreter

As the limits of Large Language Models become more apparent, they're being paired with other tools instead of being allowed to go it alone

A New Entrant in the Race for "Multimodal" AI