Chatbots Aren't Enough

ChatGPT and its cousins can't cope with the real world. So AI and robot researchers are adding other tools. Plus: A cool robot wheelchair & other news

Mar 30, 2024

Image via DALL-E 3 Prompt: “A robot ‘sorcerer's apprentice’ who is overwhelmed, in the style of a Disney animated cartoon of the 1940s”

Large language models like ChatGPT are handy for search, because they let you frame questions precisely. When I want to know how to reset the clock on my model XYZJ2 rice cooker, I don’t need to wade through pdfs and e-commerce sites that were “hits” when I typed into Google’s search box. Instead I’ll get instructions in normal English prose about my exact question. It tells me just what I want to know.

But the models’ imperative to tell you what you want to hear is often stronger than their drive to tell the truth. In other words, they make shit up. That's rather seductive -- what they make up is usually an answer I like, which makes me inclined to accept it. Trained to be helpful to the humans, LLMs seem to play to our confirmation bias.

You can see how that might be a problem for a chatbot that gives legal advice.

Nope, not Legal, Whatever the ‘Bot Says

It certainly is in New York City, where the government's "MyCity Chatbot" has been telling people they can break the law. As Colin Lecher describes in The Markup, the Chatbot (which uses OpenAI's GPT via Microsoft Azure) assured people that it's OK for their store to stop accepting cash; that their funeral home need not disclose its prices; that they could refuse to rent apartments to poor people, and that they could take a cut of their workers' tips in their restaurants.

Whoops. Those acts are actually illegal in New York City. 1

Obviously the way language-using AIs do their work (predicting which words are most likely to occur near one another) isn't accurate enough for legal advice. So, maybe LLMs need to be supplemented with some other kind of software?

That's the idea behind a "lawbot" being developed by Ruzica Piskac, a computer science professor, and Scott Shapiro, a law professor, both at Yale. In their system, a query like "Can my store at 123 Berreby Avenue stop accepting cash?" goes into a LLM. But it's also run through an “automated reasoning” system — algorithms for which my city’s laws, and my question, have been translated into the mathematically precise language of formal logic.

Unlike the LLM, which would base its answer on statistics about words, the automated reasoner must produce a reply that obeys the rules of reason. So if the law says “no tip-skimming from employees,” it can’t tell me tip-skimming is OK.

You're going to be seeing more stories of disillusion with "AI" -- meaning disappointment about the fact that Large Language Models don't cope so well with the real world. So in both AI and robotics, you'll be seeing more hybrid systems that combine LLM advantages -- ease of communication, deep wells of information -- with algorithms that do other things well (like math, staying consistent from one day to the next, being accurate, keeping the robot from crashing into a wall).

Getting Robots to Work Together Takes More Than One Kind of AI

Here's a robot example of the same principle -- using LLMs but combining them with less wacky software.

This work, by Ishika Singh and Jesse Thomason at USC (whose earlier work I wrote about in this month's Scientific American) and David Traum, is a hybrid approach to getting robots to work together.

The old-school way to do that is to plan in detail and in advance for every possible act that every robot can do. That's inflexible, and massively complicated, especially when there is more than a single robot is involved. Imagine cooking breakfast with a partner if you have to work out every action beforehand. ("OK, you get the butter. If you drop the butter, I will step 3 paces to the side and get the canola oil, while you retrieve the butter.")

The TwoStep system running in a simulation (AllenAI’s AI2THOR system). The virtual robot is following the instruction: “Put the apple, the egg and the wine bottle in the fridge.”

On the other hand, if you ask a LLM what to do, it might tell you to go get more butter from the farmer down the lane, or order it on Amazon, or that you can make your own butter from the cream in your fridge, or some other plausible but useless palaver.

Singh and Thomason's "TwoStep" blends LLM answers with robot planning algorithms to guide two robots as they divvy up chores. Given a complex job to do, the LLM suggests tasks for one of the robots (which means less formal planning is needed beforehand). But once the second robot’s chore is assigned, the actual getting-done of the work is the job of a strict planner algorithm.

The March of the Humanoid Robot Announcements Continues

Agility is in Amazon warehouses. Figure is chatting while putting away the dishes, and is being tested in a BMW factory. NEO, by 1X, is putting away groceries and folding laundry. Now Apptronik is announcing that its Apollo humanoid is going to use Invidia's new GR00T "foundation model." GR00T is another strategy for solving the "LLMs can't do it all" problem.

A foundation model is an AI that has a ton of data about the world and uses it to make predictions about what people want when they ask it for things. ChatGPT and its ilk are foundation models that know a lot of text.

But robots need more. So there's a lot of interest now around foundation models that can handle text, sound, images, video and other modes -- for instance, human actions. The GR00T system, which uses a dedicated Invidia chip, can do all that. The plan (hope?) is that it will enable robots to learn by watching people do things, as well as by watching videos, receiving verbal instructions or looking at diagrams.

Department of Military Robots

The invaluable Defense One site reports that the army is contemplating a new kind of platoon -- one dedicated to robots (including drones). Meanwhile, the Navy is using robot boats to deter human smuggling in the Caribbean.

Cool Robot of the Week

It has been 150 years since someone patented a wheelchair in the U.S. That 19th century technology has stood the test of time, but it has drawbacks: You need your hands and arms to operate it, and it can't go in narrow spaces where a person on two feet can easily walk.

This team of researchers 2 from the University of Illinois at Urbana-Champaign have come up with a no-hands, no-wheels chair that runs on a rotating ball. The user controls it by shifting her weight. Aside from liberating the hands, this reduces the width of the device, so users can go places and do things that typical wheelchairs prevent.

Using principles learned in making robots that can keep themselves oriented as they move around, the team created a chair that's light and adjusts to its person. Their "Personalized Unique Rolling Experience" controls the ball via a robot drive train. Sensors translate torso movements into speed and directional commands. It works as well as a joystick. And unlike a joystick it leaves both hands free.

It's a good example of how "low-level" intelligence in robotics is starting to yield results for ordinary people. By "low level" I mean the kind of computation required to keep a robot dog, person or ball from falling over. It's not as glamorous as making a robot talk or do backflips, but keeping a robot from falling over is a big engineering challenge (it's the reason we don't have robot soccer players, in case you were wondering).

No word on how much this might cost a regular Jane or Joe. But if it could be made for a reasonable price it would probably make life better for a lot of people.

For tl;dr, this video, while a bit slick, covers the basics.

After the story ran yesterday, the bot seems to have been fixed. It answered the cash and apartment-renting questions correctly when I posed them late Friday morning.

Seung Yun Song, Nadja Marin, Chenzhang Xiao, Mahshid Mansouri, Joao Ramos, Yu Chen, Adam W. Bleakney, Jeannette R. Elliott, Patricia B. Malik, Elizabeth T. Hsiao-Wecksler, Deana C. McDonagh and William R. Norris

Robots for the Rest of Us