The conversational AI bot ChatGPT is having a moment, promising to transform the ways in which we produce written text, search the web, and educate ourselves.
The latest ChatGPT achievement? Almost passing the US Medical Licensing Exam (USMLE).
We’re talking about an exam known for its difficulty here, one that usually requires some 300 to 400 hours of preparation to complete and which covers everything from basic science concepts to bioethics.
The USMLE is actually three exams in one, and the competency with which ChatGPT is able to answer its questions shows that these AI bots could one day be useful for medical training and even for making certain types of diagnoses.
“ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement,” write the researchers in their published paper. “Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”
ChatGPT is a type of artificial intelligence known as a large language model or LLM. These LLMs are specifically geared towards written responses, and through vast amounts of sample text and some clever algorithms, they’re able to make predictions about which words should go together in a sentence, much like the big brother to your phone’s predictive text function.
That’s something of a simplification, but you get the idea: ChatGPT doesn’t actually ‘know’ anything, but by analyzing a huge amount of online material, it can construct plausible-sounding sentences on just about any topic.
‘Plausible sounding’ is the key, though. Depending on the probability of various phrasing, the AI can seem uncannily smart or come to the most ridiculous conclusions.
Researchers from the Ansible Health startup tested it using sample questions from the USMLE, having checked that the answers weren’t available on Google – so they knew that ChatGPT would be generating new responses based on the data it’s been trained on.
Put to the test, ChatGPT scored between 52.4 percent and 75 percent across the three exams (the pass mark is usually around 60 percent). In 88.9 percent of its responses, it produced at least one significant insight – described as something that was “new, non-obvious, and clinically valid” by the researchers.
“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” the study authors said in a press statement.
ChatGPT also proved to be impressively consistent in its answers and was even able to provide reasoning behind each response. It also beat the 50.3 percent accuracy rate of PubMedGPT, a bot trained specifically on medical literature.
It’s worth remembering that the information ChatGPT has been trained on will include inaccuracies: if you ask the bot itself, it will admit that more work is needed to improve the reliability of LLMs. It’s not going to replace medical professionals at any point in the foreseeable future.
However, the potential for parsing online knowledge is clearly huge, especially as these AI bots continue to get better in the years to come. Rather than replacing humans in the medical profession, they could become vital assistants to them.
“These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making,” write the researchers.
The research has been published in PLOS Digital Health.