A Harvard study shows AI model can outperform physicians in emergency room diagnoses

Skye Jacobs

Posts: 1,988   +58
Staff
In a nutshell: Artificial intelligence is starting to handle one of medicine's hardest tasks: making the right call when the data is incomplete. A new study in Science tested an OpenAI reasoning model on real clinical cases and found it could match and often outperform physicians when diagnosing patients and deciding on care. The work comes from researchers at Harvard Medical School and Beth Israel Deaconess Medical Center, who focused on how the system performs under real-world conditions rather than controlled benchmarks.

In one case, a patient came into the emergency department with a pulmonary embolism. The condition initially improved with treatment, then worsened. Doctors suspected the medication was failing. The AI model, using the same electronic health records available at the time, flagged a possible history of lupus – an autoimmune disease that can lead to heart inflammation. That turned out to be the correct explanation.

The researchers evaluated the model at multiple stages of care, from triage through hospital admission. At each step, they measured how well it could arrive at the right diagnosis using only the information available at that moment. Overall, the model outperformed two experienced physicians working under the same constraints.

"This is the big conclusion for me – it works with the messy real-world data of the emergency department," Dr. Adam Rodman, a clinical researcher at Beth Israel and one of the study authors, told NPR. "It works for making diagnoses in the real world."

The team also tested the model against clinical case reports from the New England Journal of Medicine and other standardized diagnostic challenges. These cases test complex diagnostic reasoning. The model again outperformed a large group of physicians used as a baseline.

"The model outperformed our very large physician baseline," said Raj Manrai, assistant professor of Biomedical Informatics at Harvard Medical School, who was also part of the study.

One important limitation is what the model didn't use. It relied entirely on text-based records. It did not process images, sounds, or nonverbal cues – inputs that are important in real clinical work. Even so, it handled uncertainty better than earlier systems, especially when building a differential diagnosis, where multiple possible conditions must be considered and narrowed down.

That marks a shift from previous generations of large language models, which often struggled when cases were ambiguous or incomplete.

Outside experts say the progress is real, but the path to deployment is unclear. "This paper is a beautiful summary of just how much things have improved," says Dr. David Reich, chief clinical officer for Mount Sinai Health System in New York, who was not involved in the work. "You have something which is quite accurate, possibly ready for prime time," he says. "Now the open question is how the heck do you introduce it into clinical workflows in ways that actually improve care?"

That gap between performance in testing and impact in practice is still unresolved. Emergency care is only one slice of the healthcare system, and the researchers note the model may not perform as well with long, complex hospital stays that involve more variables and evolving conditions.

The authors are also clear about what the study does not show. There is no evidence that AI should replace doctors. Instead, it points to a tool that could support clinical decision-making, especially in fast-moving environments where time and information are limited.

"I think it does mean that we're witnessing a really profound change in technology that will reshape medicine," Manrai said.

What comes next is harder: testing these systems in live clinical settings. That will require carefully designed trials to measure whether they actually improve outcomes, not just accuracy.

"It's a very challenging process to design these trials," Reich said, "but this study is a perfect call to action."

Permalink to story:

 
Can't be right
bmkks99lcof31.jpg


But seriously though, with how confidently wrong AI can be, let's keep the human factor involved.
 
But seriously though, with how confidently wrong AI can be, let's keep the human factor involved.
To be fair, the LLMs that you're referring to are generic and taught to be that way to sell themselves for the general public.

Something hyperfocused on medical training data would be a lot more reliable, and could even be trained to give a confidence rating too (so a human can corroborate it), if they're not already.
 
To be fair, the LLMs that you're referring to are generic and taught to be that way to sell themselves for the general public.

Something hyperfocused on medical training data would be a lot more reliable, and could even be trained to give a confidence rating too (so a human can corroborate it), if they're not already.

- Ironically tech companies have sort of shot themselves in the foot by making "free" AI deeply untrustworthy, so now (for many reasons) public opinion has largely turned against AI even in scenarios where it can be tuned to be extremely useful.

Its sort of the opposite of that crime drama "enhance" effect on juries in criminal trials.
 
Great right up until it makes a catastrophic mistake. Then it becomes a heavy legal dumbbell.
The same is true of a catastrophic mistake by a human doctor, no?

But seriously though, with how confidently wrong AI can be, let's keep the human factor involved.
I've received some truly absurdly incorrect diagnoses from human doctors, yet they were invariably confident in them.

- Ironically tech companies have sort of shot themselves in the foot by making "free" AI deeply untrustworthy
And yet, outside the anti-AI technosphere disinformation bubble, those free AI services are receiving incredibly heavy usage on a daily increasing basis.
 
And yet, outside the anti-AI technosphere disinformation bubble, those free AI services are receiving incredibly heavy usage on a daily increasing basis.

- Absolutely, I use plenty of free AI stuff on a regular basis, just not for anything that actually matters.
 
"The hard part is figuring out what to do with that"


Well just like everyone else, doctors gotta lose their jobs as well...with A.I no one is the exception, the higher paid your job is, the higher the risk you are at.
 
AI model can outperform physicians in emergency room diagnoses, really? And it can't write a legal brief without hallucinating cases it cites in filings to the court? I will take my chances with a human doctor0
 
The same is true of a catastrophic mistake by a human doctor, no?


I've received some truly absurdly incorrect diagnoses from human doctors, yet they were invariably confident in them.


And yet, outside the anti-AI technosphere disinformation bubble, those free AI services are receiving incredibly heavy usage on a daily increasing basis.

To your first point, the liability question is simple. If AI is making the final diagnosis, and it's wrong, who do you assign liability to? The hospital? The doctor (there is no doctor). The company that made the AI?

I've read many of your responses on other stories. You're clearly smart. So you would know that the ability to clearly assign liability is one of the most important elements of the implementation of any nascent technology, and when it's unclear, it's more than a boondoggle (see issues related to the Telecommunications Act of 1996, particularly Section 230, and how it couldn't foresee the rise of social media).

Lawyers using AI that returned complete hallucinations (such as citing non-existent case law) have had entire cases thrown out. That's terrible, but at least almost all of those can be re-filed with the proper changes (so long as they're dismissed without prejudice). You don't have that ability if an AI ****s up a diagnosis and that's the end-all, be-all. Further, without the ability to clearly assign liability, you lose the one element of recourse should that unfortunate situation crop up.

So fine, let's use AI as another eye, another perspective, when it comes to the medical field. I don't particularly mind that. However, just like any perceived system that claims to increase efficiency, precision, and accuracy (yet still requires human intervention), you have to watch out for the insidious rise of complacency. And if you don't think that overworked and stressed out medical professionals won't end up relying and deferring to AI to lessen the workload, that's the height of naivety.
 
To your first point, the liability question is simple. If AI is making the final diagnosis, and it's wrong, who do you assign liability to? The hospital? The doctor (there is no doctor). The company that made the AI?
The question is simple, but the answer is as well. In the short term, a doctor will still remain involved, albeit aided by an AI diagnosis they're expected to confirm. Liability: theirs. Longer-term, if you pay a medical service for an AI-only diagnosis, the liability is theirs. If you use a free AI model for such, then the use disclaimers will certainly specifically contain multiple "use at your own risk" disclosures.

Finally, I'll note the case law for indirect product liability is well-established. Those making medical-specific AI models will certainly be sued regularly under product defect grounds, a cost they'll ultimately have to absorb and pass onto other patients.

However, just like any perceived system that claims to increase efficiency, precision, and accuracy (yet still requires human intervention), you have to watch out for the insidious rise of complacency...
It's a very good point (and you're obviously versed on the subject if you understand the distinction between model precision and accuracy), but I fear we're already over the cliff here. It began a couple decade ago actually in education, when students replaced "research" with "Cut and paste from a Wiki Article, or the first Google hit I find". Now, I'm told, nearly all students simply ask AI to perform their homework entirely for them.
 
Back