A simple AI coding upgrade boosts accuracy, cuts errors, saves time — and may even outperform doctors.
Researchers have found that a small adjustment in the way artificial intelligence (AI) assigns diagnostic codes can greatly boost accuracy—potentially surpassing physicians.
Published online in NEJM AI the study highlights how this approach could ease doctors’ administrative workload, minimize billing mistakes, and enhance the overall quality of patient records.
Rethinking How AI Assigns Diagnostic Codes
“Our previous study showed that even the most advanced AI could produce the wrong codes, sometimes nonsensical ones, when left to guess,” says co-corresponding senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “This time, we gave the model a chance to reflect and review similar past cases. That small change made a big difference.”
Doctors in the United States spend hours every week assigning ICD codes—alphanumeric strings, used to describe everything from sprained ankles to heart attacks. But large language models, like ChatGPT, often struggle to assign these codes correctly. To address this, the researchers tried a “lookup-before-coding” method that first prompts the AI to describe a diagnosis in plain language and then choose the most fitting code from a list of real-world examples. The approach delivered greater accuracy, fewer mistakes, and performance on par with or better than humans.
Testing the “Lookup-Before-Coding” Approach
The team utilized 500 Emergency Department patient visits at Mount Sinai Health System hospitals. For each case, they fed the physician’s note to nine different AI models, including small open-source systems. First, the models generated an initial ICD diagnostic description. Using a retrieval method, each description was matched to 10 similar ICD descriptions from a database of more than 1 million hospital records, along with how often those diagnoses occurred. In a second step, the model used this retrieved information to select the most accurate ICD description and code.
Emergency physicians and two independent AI systems evaluated the coding results independently, without information about whether the codes were generated by AI or clinicians.
AI Outperforms Physicians — and Even Small Models Shine
Across the board, models that used the retrieval step outperformed those that didn’t, and even did better than physician-assigned codes in many cases. Surprisingly, even small open-source models performed well when allowed to “look up” examples.
“This is about smarter support, not automation for automation’s sake,” says co-corresponding senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, and Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer for the Mount Sinai Health System. “If we can cut the time our physicians spend on coding, reduce billing errors, and improve the quality of our data, all with an affordable and transparent system, that’s a big win for patients and providers alike.”
Supporting — Not Replacing — Human Oversight
The authors emphasize that this retrieval-enhanced method is designed to support, not replace, human oversight. While it’s not yet approved for billing and was tested specifically on primary diagnosis codes from emergency visits discharged home, it shows encouraging potential for clinical use. The researchers see immediate uses, such as suggesting codes in electronic records or flagging errors before billing.
The investigators are now integrating the method into Mount Sinai’s electronic health records system for pilot testing. They hope to expand it to other clinical settings and to include secondary and procedural codes in future versions.
Beyond Coding: A Broader Vision for AI in Healthcare
“The big picture here is AI’s potential to transform how we care for patients. When technology relieves the administrative burden of our physicians and other providers, they have more time for direct patient care. That’s good for clinicians, that’s good for patients and it’s good for health systems of every size,” says David L. Reich MD, Chief Clinical Officer of the Mount Sinai Health System and President of The Mount Sinai Hospital. “Using AI in this way improves our ability to provide attentive and compassionate care by spending more time with patients. This strengthens the foundation of hospitals and health systems everywhere.”
Source-Mount Sinai School of Medicine