How Are All These Robots Going to Know What to Do?

More on work to make robots cope with the messy human world

Feb 29, 2024

You're probably going to come across some buzz soon about "multimodal models" and how they’ll help robots.

The idea is this: ChatGPT can create new emails, poems, term papers, marketing plans and so on; DALL-E creates new images; Sora creates new videos. So why not an AI that can create new actions? I ask a robot to go to the kitchen and see if there are any cans of soup, and it invents the right movements to do that in the specific messy kitchen that is mine.

The advantage here would that the robot doesn't need all its possible movements provided in advance. That would make it able to handle life with people, where one kitchen differs from another, and navigating the street at 2 PM is different from the same street at 4 PM after school has let out.

So a multimodal model would add a repertoire of possible movements to AI’s current repertoires of language and visuals. It would tell the machine how to pick up a china cup with the right pressure to keep a grip (and change that pressure if the next cup is styrofoam). That would be a great advance. Even better than combining just a language AI with a robot (which I wrote about in the current edition of Scientific American).

But it's not as easy as it sounds. Visual and language AIs work because they're trained on colossal amounts of data. Ask Sora for a video of a monkey playing chess, and it draws on many, many images of monkeys and chess players and parks to create the new video.

We do not have a vast database of, say, robots picking up cups -- or robots doing anything. This is because most robots for the past 70 years have been simple machines working in factories, not mechanical maids and butlers. So where will the data come from to supply an AI that has to invent "pick up this particular cup"?

I don't know. But I do know that people in companies and labs are starting to hint that they've cracked it.

That's the point of this video from 1X, makers of the humanoid robot EVE.

As they proudly note, these robots are not being operated by an off-screen human and they aren't slowpokes whose video has been sped up. 1X says it will be making a big announcement soon. I’m staying tuned.

Are AI-created videos amazing or do they suck? The answer is yes.

America's favorite video goofball, Mr. Beast, felt a twinge when OpenAI's video-making AI, Sora, was unveiled on February 15. In a back and forth with OpenAI's CEO Sam Altman, he asked Sora to make a video of a monkey playing chess in a park.

The result, here,

is like a Rorschach test for your attitude toward AI.

Do you see

(a) a very lifelike monkey in a very real-seeming park?

(b) a monkey that is not, in fact, playing chess -- and couldn't even if he tried, because the chessboard doesn't have the right number of squares?

Both reasonable responses. I think you should probably entertain both.

A reporter lets ChatGPT run her life.

Because an LLM is averaging out billions of words to produce something generic, it doesn't go well.

How AI learns to say things that make people trust it.

Same way we do — by trial and error.

Literary Note of the week

What happens when intelligence pervades life the way electricity does now? When you just flip a switch or say a word and you get solutions and advice as easily as you turn on lights and music? What will people do?

In "The Beautiful and the Sublime," a superb sad-funny story Bruce Sterling published in 1986, what happens is that people turn into inconsequential twits. With nothing serious to exercise their minds, almost everyone in the story is not creative, but "creative." Our narrator is an artist (“aren’t they all?” grumbles an old man from the pre-AI era.)

People spend their days making pretty things out of their exquisitely subtle (to them) feelings. Sterling gives us twittification from the inside -- we follow his smug narrator as he contentedly gossips, flirts, gives and receives praise for little nothings, and generally chases his own tail of inconsequential thoughts. It's such a skillful story. I think about it often. And many is the day when I suspect Sterling's view of the AI future -- not dystopian hellscape but rather a happy, pointless cruise -- is on the mark. And maybe, in some ways, is already with us.

The best way to read the story is to buy a good collection in which it appears -- like this one. Support writers!

However, time and money are short, so here is a link that will let you read it online. After some back and forth about whether to do this, I went with what I would prefer if this were a 40-year-old work of my own. When I come upon bootleg pdfs of my book online, I no longer try to get them taken down. I'd rather you buy my 19-year-old work (it's still in print!). But I know there is no e-book (just paperback) and reaching people matters more at this stage than generating another few pennies.

So, if you're impatient or impecunious or you live far from the reaches of used-book delivery, click the link, read the story. But don't be a dick about it: Don't print it out and circulate it. And buy the anthology when circumstances improve (it's very good). Also, if you read it that way, please buy a book as an offset. You don't want Large Language Models doing all the writing just yet.

2 Comments

Thalia Toha

Story Arks

Mar 22

This is interesting, David- Thanks for sharing. Right now my biggest problem with AI is the unnerving looking images that looks overly glossy and shiny. I guess this means that there’s still room for human skills.

Expand full comment

1 reply by David Berreby

1 more comment...

Robots for the Rest of Us