Don’t hang up the stethoscope just yet

3 minute read


All this talk of replacing doctors with AI is proving premature.


As is the disposition of technology enthusiasts the world over, news from earlier in the year that detailed how the whizz-bang AI chatbot ChatGPT had passed the US medical licensing exam without any clinical help sparked a flurry of commentary portending the imminent demise of flesh-and-blood physicians.

Folks with shedloads of skin in the game, such as OpenAI head honcho Sam Altman, the outfit that developed ChatGPT, suggested the digital dynamo could one day serve as “medical adviser” for those unfortunate enough to live in the US and not be able to afford proper healthcare.

As harebrained as this idea sounds, enough people were taking it seriously enough to prompt boffins at Stanford University to test how well this souped-up Dr Google could perform in real-life medical scenarios.

The short answer? Really, really badly.

You can read the early findings, which have yet to be peer-reviewed, here, but suffice to say it’s not quite time to flog your stethoscopes on Gumtree just yet.

According to reports on STAT News, the researchers found nearly 60% of ChatGPT’s answers to actual medical situations either disagreed with a human expert’s opinion or weren’t relevant enough to be helpful.

Or in other words, you could probably just ask your local aromatherapist for advice and do just as well.

In their testing, the Stanford researchers asked the AI real-life medical questions, recorded the responses, and had 12 clinical experts evaluate those answers.

While a bit more than 90% of its answers were deemed “safe” enough to not be harmful (though not necessarily accurate), only 41% of the answers “agreed” with the answers of medical experts, while 29% were too vague or irrelevant to be assessed.

Remarkably, AI enthusiasts still found a way to talk up these results, suggesting the current chatbot was a big improvement on an earlier version which only agreed with the human medical experts 20% of the time. 

“We’re evaluating these technologies the wrong way,” said Nigam Shah, a professor of medicine at Stanford who led the research. “What we should be asking and evaluating is the hybrid construct of the human plus this technology.”

Which is certainly a model with potential, but is still a very long way from the concept of providing cheap and reliable healthcare and advice to folks who aren’t able to see a real doctor.

Call us a Luddite if you wish, but we reckon the chances of ChatGPT becoming ChatGP anytime soon are low.

Send your story tips and ideas to penny@medicalrepublic.com.au to have them assessed by an actual human.

End of content

No more pages to load

Log In Register ×