Everyone’s Got ChatGPT Fever!

And, most importantly, if you put the symptoms related to your fever into ChatGPT, it will generate a reasonable differential diagnosis.

“So?”

This brief report in Annals describes a retrospective experiment in which 30 written case summaries lifted from the electronic documentation system were fed to either clinician teams or ChatGPT. The clinician teams (either an internal medicine or emergency medicine resident, plus a supervising specialist) and ChatGPT were asked to generate a “top 5” of differential diagnoses, and then settle upon one “most likely” diagnosis. Each case was tested both solely on the recorded narrative, as well as with laboratory results added.

The long and short of this brief report is the lists of diagnoses generated contained the correct final diagnosis with similar frequency – about 80-90% of the time. The correct leading diagnosis was chosen from these lists about 60% of the time by each. Overlap between clinicians and ChatGPT in their lists of diagnoses was, likewise, about 50-60%.

The common reaction: wow! ChatGPT is every bit as good as a team of clinicians. We ought to use ChatGPT to fill in gaps where clinician resources are scarce, or to generally augment clinicians contemporaneously.

This may indeed be a valid reaction, and, looking at the healthcare funding environment, it is clear billions of dollars are being thrown at the optimistic interpretation of these types of studies. However, what is lacking from these studies are any sort of comparison. Prior to ChatGPT, clinicians did not operate in an information resource vacuum, as is frequently the case in these contrived situations. When faced with clinical ambiguity, clinicians (and patients) have used general search engines, in addition to medical knowledge-specific resources (e.g., UpToDate) as augments. These ChatGPT studies are generally, much like many decision-support studies, quite light on testing their clinical utility and implementation in real-world contexts.

Medical applications of large language models are certainly interesting, but it is always valuable to remember LLMs are not “intelligent” – they are simply pattern-matching and generation tools. They may, or may not, provide reliable improvement over current information search strategies available to clinicians.

ChatGPT and Generating a Differential Diagnosis Early in an Emergency Department Presentation