What AI does to "cognitive class" work, and workers
A recent study is full of interesting surprises. Here are some I couldn't fit into my NY Times story today.
So today I have a short piece in The New York Times on a recent experiment with ChatGPT use by elite workers at Boston Consulting Group. These are the sort of "cognitive class" members whose jobs "everybody knows" — in some woolly way — will be affected by generative AI.
What intrigued me about this study (which you can read here (pdf)) is the way the researchers (Fabrizio Dell'Acqua, Edward McFowland III, Ethan Mollick, Hila Lifshitz-Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon and Karim R. Lakhani) set out to rigorously replace everybody’s vague generalizations with hard data. They
(a) tested ChatGPT use in a real setting — the multi-task workday of a high-end associates at BCG, one of the world's biggest and best-regarded management consulting firms (BCG people helped design the experiments);
(b) tested it on a lot of people (758, to be exact);
(c) made sure that the consultants had incentives to take the experiment seriously (top scorers were rewarded with recognition that had some impact on their annual bonuses);
(d) Ran real controlled experiments, with two sets of tasks, each performed by three subgroups: people who used ChatGPT with no particular instructions; people who did the same work with ChatGPT after a short training, and a control group that did the work without the Large Language Model's involvement;
(e) Brought in multiple points of view — in addition to taking statistical data about their volunteers (gender, familiarity with English, ethnicity, familiarity with AI, among other variables), the researchers have also done in-depth debriefings to assess how their volunteers thought and felt about using AI. For future work, the study's authors have access to everyone's logs, so they can see exactly what prompts any given consultant used, and the answers s/he got.
All in all, a big step toward replacing vagueness with well-won information about how people use and adapt to Large Language Models (the form of generative AI, like ChatGPT, Bard and Baidu’s ERNIE, that uses language in human-like ways).
Bottom line: The AI gave a big boost to people who worked on a creative, brainstorming, blank-page set of tasks. But for those who worked on tasks that required comparing information from different sources, ChatGPT actually went astray fairly often, leading those who trusted it to turn in mistaken work. (The results of all the tasks were scored by people who, of course, did not know which work was AI-assisted and which was not.)
I won't (much) rehash the Times piece here. Instead, I want to get into the interesting stuff I had to leave out of the article, due to a strict limit on its length. (Now that y'all read everything on your phones, us media types are pressured to write shorter and shorter.)
Here are findings I didn't have room to write about in the Times:
1 Better for Individuals, But Not for the Group
ChatGPT didn't improve overall diversity-of-thought. While individual consultants on the brainstorm task were scored as doing better with ChatGPT than were those who didn't use it, their entire group's variety of ideas was narrower than was the human-only group's.
This shouldn't be a surprise, I think. A Large Language Model is like a waiter who, asked what's good on the menu, replies "a lot of people order the chicken." Trained to predict what will most resemble what it already "knows" in its data, it will say something that resembles what’s already been said. Its answers often sound, to me anyway, a little generic.
Even if this experimental result isn't surprising, though, it's still significant. It shows, in a real-life setting, a potential for Large Language Models to create an intellectual "tragedy of the commons." In this study, ChatGPT use reduced the group's store of genuinely surprising, novel ideas. But when it increases your individual success, what incentive do you have to help the organization avoid that narrowing of ingenuity?
2 Being Weirded Out by AI is Unavoidable
Working (as opposed to playing) with generative AI is a weird experience. As co-author Ethan Mollick, a professor at the Wharton School, told me when we spoke, you can't get a feel for what ChatGPT will be like by reading about it. And God knows there is no instruction manual. (In fact, on the analytical task in the study, people who had received a brief training on ChatGPT did worse than people who'd just plunged in.) To understand what it is to work with this thing, you just have to engage with it, let the strangeness suffuse your brain.
Naturally, the consultants reacted with ambivalence to a tool that makes their lives easier (except when it doesn't) and makes them wonder what will be left for them to do as these things improve. (After all, as Mollick says, "this is the worst AI they will ever use," and it's already freakishly good at some things.)
3 AI Doesn’t Act Like A Person. Or Like Any Computer You’ve Previously Used.
ChatGPT is also, as I've mentioned, not good at other things. It wasn't always clear where to find the line between "better than my MBA students" (Mollick's phrase) and "mistaken, so don't trust it."
Most of us don't yet have a frame for understanding AI. So we default to the way we understand computers, or people: With an implicit hierarchy of skills. We have a hunch that if it can write an essay in the style of Chesterton, it should be able to do basic math. But LLMs don't have the same ladder. They can do more in some ways than any software you've seen. In other ways, they can't even best a calculator you bought at Target. That's disorienting.
4 AI Might Turn You Into a Centaur. Or a Cyborg. Those Aren’t the Same Thing.
Not everyone works with generative AI in the same way. The experiment found that the consultants broke into two types. One sort of user was a "centaur" — Gary Kasparov's term for a person who forms a hybrid with an AI. I've often quoted this metaphor, the image of a melded creature, blending human and AI. However, some of the consultants were so melded that the centaur metaphor didn't fit. The authors call this group "cyborgs."
"Cyborg" users were so intertwined with ChatGPT that a boundary between human and machines wasn't clear. After asking ChatGPT to summarize background information about a problem, for instance, a centaur would turn away, to use that information to do her work. A cyborg would keep the AI involved, asking it questions, telling it where it had made mistakes, perhaps asking it to assume a persona ("you're a skeptical technologist considering these marketing ideas" (my invention, not the paper's)). This made it hard to say which parts of the work were ChatGPT and which were human.
I'm temperamentally a centaur but I can feel myself sliding toward the cyborg approach, as I get used to LLMs. I've developed a sort of "feeling for the mechanism" that renders me more confident in my ability to merge it into my minute-to-minute grind. I hadn't realized this, though, until I read the paper's typology.
5 Deep existential questions about AI don’t matter to people just trying to use it.
Are current LLMs on the road to some kind of sentience? (I don't know; I doubt it.) Or are they just souped-up auto-correct, as the Yale computer scientist Theodore Kim put it, "the industrial production of a knowledge sausage, which crams together so much data that its ability to spit out a million possible outputs becomes relatively quotidian." (Again, I don't know; maybe they are.) This study forcefully brought home the fact that the answers to these deep questions don't really matter in the workplaces, schools, homes and other venues where generative AI is being adopted. Is it a white cat or a black cat? To paraphrase Deng Xiao-ping, as long as it catches mice, it's not going away.
One final note: Aside from my Times piece and the paper itself, you can read interesting comments on it from some of the authors: François Candelon, a Managing Director and Senior Partner at BCG and director of its Henderson Institute, gives his interpretation here. And Mollick wrote up the working paper on his own Substack, right here. More generally, Lakhani, a professor at Harvard Business School and co-director of its Laboratory for Innovation Science, contemplates AI's advent in his Substack, here.
One fun aspect of this article for me as a journalist was the divergence of interpretations among the authors. I wouldn't say they disagree, but they definitely have different takes on which findings are most striking, and about how much we can control as we all tumble towards cyborgdom and centaurhood.
Good piece. As an AI writer, the urge to not even be in the centaur camp is strong because I know the pull to cyborg is constant.
Nicely done David. I was very struck when i read the piece in the Times how much with AI we appear to be getting into the territory of an ourorborous eating its own tail. You further remarks accelerate that fear. Given a model of language based on pre-existing 'tokens,' i think this was inevitable. Bring back punk, I say.