At the same time, experts stress that it is “essential” that any response be carefully reviewed by a medical professional.
A study led by researchers at the University of California, USA, showed that chatbots can be better than a doctor at creating the impression of friendly treatment by answering questions from patients in public online forums.
However, while the results suggest that artificial intelligence (AI) assistants can help draft answers to patients’ questions, experts stress that the application of the tool in clinical practice should be supervised by physicians. This criterion is based on the history of basic errors made by the algorithm and the possibility of including incorrect or invented data in its responses. Also relevant is the fact that the experimental procedure only evaluated the answers given by a single professional, which limited the scope of the results.
The research evaluated the empathic responsiveness of credentialed healthcare professionals and the ChatGPT AI chatbot. To do this, a panel of experts examined which one performed best when answering 195 randomly drawn medical questions from a public sub-community called ‘AskDocs’ (Ask a Doctor), hosted on the news aggregator and social forum website Reddit.
Each pair of answers to each question was examined blindly by three different judges. The evaluators chose “which answer was better” and judged both “the quality of the information provided” (very bad, bad, acceptable, good or very good) and “the empathy or treatment provided at the bedside” (no empathy, slightly empathetic, moderately empathic, empathic, and very empathic). The average results were ordered on a scale of 1 to 5 and compared between the chatbot and the doctor.. The score was averaged, making a total of 585 evaluations.
In 78.6% of cases, jurors preferred the chatbot responses, arguing that they had higher quality information and more empathetic language. Compared to physician responses, about 4 times as many ChatGPT responses were at the highest levels of quality and 9.8 times more had the highest ratings for empathy.
The chatbot’s responses were also significantly longer, averaging 211 words compared to 52 words used by the doctor. The article describing the study was recently published in JAMA Internal Medicine.
They are not the panacea
The researchers believe that since online forums may not reflect typical physician-patient interactions, where there is a pre-existing relationship and more personalization, the implementation of this tool in clinical practice should be further explored. . Randomized trials could further assess whether the use of AI assistants could improve responses, reduce physician burnout and improve patient outcomes.
Given the propensity of these tools to ‘hallucinate’ and fabricate facts, “it would be dangerous to rely on any factual information provided by such a chatbot response,” warns Anthony Cohn of the University of Leeds, UK. “It is essential that any response be carefully reviewed by a medical professional,” he stresses.
“As the authors explicitly acknowledge, they looked at a very small sample of medical questions submitted to a public online forum and compared physician responses to what ChatGPT responded to. Neither the physician nor GPT had access to the patient’s medical history or additional context. . It should not be assumed that your results apply to other questions, formulated differently or evaluated differently. This was not a randomized controlled trialsaid Professor Martyn Thomas, from Gresham College in the UK.