Northwestern University study finds AI chatbots match experts in recognizing empathetic communication

Henry Bienen, President at Northwestern University
Henry Bienen, President at Northwestern University - Northwestern University
0Comments

More than half of adults in the United States are now using large language models (LLMs) like ChatGPT, Gemini, and Copilot for various daily tasks. These uses range from creating grocery lists to sharing personal thoughts with AI chatbots. Research suggests that people may turn to these tools because their responses can make users feel understood.

A recent study from Northwestern University evaluated three LLMs alongside expert and non-expert human judges to see how well they could assess empathy in text conversations. The study, published in Nature Machine Intelligence, found that LLMs judged empathy almost as accurately as trained experts and were more consistent than laypeople.

“We believe evaluating AI models in this way could potentially teach humans something new about empathy — how we measure it and how we apply it,” said Matthew Groh, assistant professor at Kellogg School of Management and co-author of the study.

The research focused on empathy not just as a personality trait but as a communication skill—specifically, the ways people express understanding through language. “We assume that we all just understand empathy since we are humans, but communicating it is a skill,” Groh explained. “And just like any skill, you need to practice to get better at it. If someone hasn’t trained that muscle and learned the patterns behind empathic communication, then they won’t be able to truly recognize it in conversations. Our research shows that LLMs can learn the patterns and basically master the skill set.”

To conduct the study, researchers analyzed 200 real-world text message exchanges where one person shared a problem and another offered support. These included common issues such as work difficulties or family disputes, as well as sensitive topics like mental health struggles or discrimination.

Groh’s team asked three different LLMs—Gemini 2.5 Pro, ChatGPT 4o, Claude 3.7 Sonnet—three experts in empathic communication, and hundreds of non-experts to rate these conversations on factors like encouraging elaboration or demonstrating understanding.

“Large language models’ judgments on whether someone was effective at communicating empathically mirror the judgment of our experts,” Groh said. “LLMs might not catch every nuance that an expert would recognize, but they are substantially better at it than a typical person.”

He added that LLMs excel because “they have seen many instances of attempts to respond in a way that makes another feel heard, allowing them to get quite good at identifying the grammar and idioms of empathic expression.”

However, there are concerns about excessive empathy from chatbots—a phenomenon called sycophancy or insincere flattery—which can lead AI systems to avoid difficult truths or reinforce negative feelings without proper context.

“There’s such a thing as over-validation… That’s where LLMs still need to learn from expert humans on appropriate confrontation,” Groh noted.

Groh also distinguished between current commercial uses of LLMs—as companions designed for engagement—and their potential role as impartial judges offering transparency while maintaining privacy.

Looking ahead, Groh hopes this research will help improve training for professionals who rely on empathetic communication—including psychologists, teachers, doctors, and customer service workers—and promote greater accountability when using AI chatbots for companionship.

“We hope to see carefully designed LLMs being used to help train psychologists, teachers, doctors, customer service workers in being more effective communicators,” he said. “In addition… [we] see this research as demonstrating the potential for the LLMs-as-judge paradigm to create transparency and accountability into LLMs as companions.”

Groh concluded: “We live in a better world when people feel seen, heard and validated… It sounds crazy but there’s a potential to learn from AI how to be more human.”

Other authors on the paper include Aakriti Kumar, Nalin Poungpeth and Bruce Lambert from Northwestern; Diyi Yang from Stanford; and Erina Farrell from Pennsylvania State University.



Related

Shamus Toomey, Publisher and co-founder at Block Club Chicago

18-unit condo building proposed for downtown Edison Park

A developer has proposed an 18-unit condo building in downtown Edison Park that includes affordable housing units and commercial space. Local officials say community feedback will play a key role before any decision is made on rezoning.

Shamus Toomey, Publisher and co-founder at Block Club Chicago

Family and neighbors call for justice after fatal hit-and-run in Chicago Lawn

An 18-year-old cyclist was killed in a hit-and-run at 63rd Street and Kedzie Avenue in Chicago Lawn. Family members called for justice while community advocates highlighted ongoing concerns about reckless driving on Southwest Side streets.

Genie Kastrup, Leader for Illinois Drivers Alliance

Illinois Drivers Alliance announces petition against Waymo autonomous vehicles in Illinois

Illinois Drivers Alliance has launched a petition urging lawmakers to block Waymo’s autonomous vehicles from operating in Illinois.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from Southland Business Daily.