Here's how often AI ‘doctors’ get things wrong, according to human experts

Name: Expert explains future of Artificial Intelligence
Uploaded: 2026-06-25T18:09:15-04:00
Description: Vice President Kamala Harris spoke Wednesday at an Artificial Intelligence conference in London, addressing the Biden administration's plans to address concerns over A.I. Simon Wilby, the creator of 1Voice A.I. joined LiveNOW from FOX's Mike Pache to discuss the concerns and potential solutions.

Expert explains future of Artificial Intelligence

Vice President Kamala Harris spoke Wednesday at an Artificial Intelligence conference in London, addressing the Biden administration's plans to address concerns over A.I. Simon Wilby, the creator of 1Voice A.I. joined LiveNOW from FOX's Mike Pache to discuss the concerns and potential solutions.

Artificial intelligence-powered chatbots such as ChatGPT can accurately answer health-related questions about three-quarters of the time, according to new research from Penn State University, a finding that highlights both the promise and limitations of using AI for medical guidance.

A team, led by Penn State associate professor and researcher Amulya Yadav, recruited participants to submit more than 200 health-related symptom descriptions and medical questions to a variety of AI systems like ChatGPT. Nine board-certified physicians then evaluated the responses for accuracy and reliability.

AI ‘doctors’ have 76% accuracy rate

By the numbers:

The study found that large language models (LLMs) achieved an average accuracy rate of approximately 76% when responding to health questions submitted by users. ChatGPT outperformed other large language models included in the analysis.

Overall, AI systems performed significantly better than traditional search engines such as Google and Bing when answering health-related questions.

ChatGPT app displayed on a phone screen is seen in illustration. (Credit: Jakub Porzycki/NurPhoto via Getty Images)

"I was surprised they [LLMs] performed as well as they did. But certainly I think that, you know, we should all be very wary of trusting anything that comes out of a language model, because it is basically a random token generator," Yadav told The Associated Press.

The findings will be presented this week at the 2026 Association for Computing Machinery Conference on Fairness, Accountability and Transparency (FAccT) in Montreal.

Where AI performed best, struggled most

Dig deeper:

The study also identified several areas where AI struggled. Questions related to dermatology, mental health and internal medicine generally received lower accuracy scores.

Dermatology questions often require image analysis, an area where AI systems remain less capable than they are with text-based information. Mental health questions also pose challenges because they require nuanced judgment.

By contrast, AI performed best on more routine health inquiries, including common illnesses and general medical questions.

AI becomes resource to people without doctors

According to the World Health Organization, more than 50% of the world population does not have access to proper health care and full coverage.

Yadav says while the average person with access to a doctor shouldn’t give it up for Dr. AI, it can be an important resource to people across the globe who don’t have access to a doctor.

"That seems to be a wonderful thing, right? Especially to that half of the population where they don't have access to a doctor. So that's one way to spin it," Yadav said, but warned that "Large language models at this point of time are not as accurate as a human physician, and therefore we should not, or we should be very cautious. That's our message to the general public that we should be very cautious in using large language models to self-diagnose."

More people continue to turn to AI

Big picture view:

The findings come as more people turn to AI tools for information traditionally sought from healthcare providers or online searches.

In fact, for nearly half of Americans, AI plays some sort of role in their health care decisions, a recent Gallup poll found.

In most cases, that role supplements other medical advice or serves as a way to get quick answers. However, in some cases, AI is being used as a substitute for a trip to the doctor’s office altogether.

Over 70% of respondents told pollsters that they used AI for some purpose in the past 30 days; however, the reasons given did vary, the poll showed. For the bulk of them, the reason came down to getting the answer more quickly or because they wanted more information about their health care. More than half also said they wanted to do research either before or after going to the doctor.

Fourteen percent of the people who were polled also claimed they skipped a trip to the doctor’s office because of information AI gave them. Gallup noted that translated to an estimated 14 million people who forwent a visit with a medical professional because of AI advice or information.

Over 40% of people who used AI in the preceding 30 days also claimed they did not want to pay or were unable to pay for a doctor’s visit. Others said they needed help outside business hours or did not have time to make an appointment, while 21% stated they felt like they had been dismissed or ignored by a provider in the past.

"My concern is more about people who can and do have access to healthcare. Do we really want to go down this path of, you know, removing the need for human doctors just because LLMs can answer? Should we ask them to answer right there? This old adage – just because you have a hammer does not mean that everything has to be a nail," Yadav added.

The Source: This story was reported from Los Angeles. The Associated Press' Campus Insights, previous FOX Local reporting contributed.

Artificial Intelligence U.S.Health Technology

The Brief

AI ‘doctors’ have 76% accuracy rate

Where AI performed best, struggled most

AI becomes resource to people without doctors

More people continue to turn to AI