Using large-language models to get medical advice and make medical decisions is a risky practice, a new study has warned.
The study, which was conducted by researchers at Oxford University, involved 1,300 participants being given specific medical conditions developed by doctors.
The participants were then split into two groups—one asked for medical advice from LLMs like OpenAI's ChatGPT, while the other collected information from traditional sources.
The results showed major gaps between LLMs and users.

Even though LLMs excel at understanding medicine and standard practices, helping users with their medical issues required a level of communication that LLMs struggled to perform.
"Despite all the hype, AI just isn't ready to take on the role of the physician," Dr. Rebecca Payne, lead medical practitioner on the study, explained in a press release.
"Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognize when urgent help is needed."
Communication Breakdown
The results of the study showed that LLMs did not provide better outcomes than traditional methods of evaluating medical conditions, like searching for information on the internet or using an individual's best judgment.
LLMs didn't always understand what a participant was asking, and users often didn't know how to provide the LLM with the right information.
Communication breakdowns between the individual and the machine made it less likely that the LLM would give the right advice.
'AI systems need rigorous testing'
Meanwhile, LLMs often provided a mix of good and bad advice. Without the help of a doctor, participants in the study often weren't able to sift through and separate the two.
Senior author Adam Mahdi of the Oxford Internet Institute said the gap between LLMs and patients should be a "wake-up call" for developers and regulators.
"We cannot rely on standardized tests alone to determine if these systems are safe for public use," Mahdi said. "Just as we require clinical trials for new medications, AI systems need rigorous testing with diverse, real users to understand their true capabilities in high-stakes settings like health care."
A Common Problem
Consulting an LLM for medical advice is an increasingly common practice, especially in the United States, where health care is often prohibitively expensive.
According to one study published in September by an AI platform, more than a fifth of Americans admitted to following advice from a chatbot that was later proven to be inaccurate.
In another study published in June 2025, researchers used developer tools to see if they could program the LLMs to provide incorrect information.
They found they could do so easily, and the chatbots confidently provided bad information 88 percent of the time.
"If these systems can be manipulated to covertly produce false or misleading advice, then they can create a powerful new avenue for disinformation that is harder to detect, harder to regulate and more persuasive than anything seen before," study author Natansh Modi of the University of South Africa warned in a statement.
Newsweek has reached out to the study authors for comment via email.







English (US) ·