Making Medical Coding Better and Faster With Artificial Intelligence

“If you embrace the idea that automation can work together with people, when you look at all the tasks that are occurring in revenue cycle, you realize that almost all of them can now become amenable to automation—to this approach of a shared human plus robot together, working together to solve a problem.”

Varun Ganapathi, PhD, Co-founder and Chief Technology Officer of AKASA

A new study from researchers using technology from AI company AKASA found the company’s novel machine learning approach (Read, Attend, and Code, or RAC) outperformed current state-of-the-art models to decode medical notes for patient billing in a matter of seconds. RAC also surpassed the performance of professional human medical coders in terms of efficiency and accuracy.

AKASA has made its name by providing algorithms to improve the revenue cycle management of healthcare, including the mundane tasks of figuring out precisely where a particular claim should be filed for payment.

“One of the problems we solve is automatically helping fix the registration section of a claim,” says Varun Ganapathi, PhD, co-founder and chief technology officer at AKASA. That might involve reading all insurance cards on file for a patient, and whether claims are filed with information in the correct order. If none of the insurance cards are valid, AKASA will then contact patients directly. Getting to that point may involve machine learning, since here too the algorithm is trying to make sense of unstructured data.

“If you embrace the idea that automation can work together with people, when you look at all the tasks that are occurring in revenue cycle, you realize that almost all of them can now become amenable to automation, to this approach of a shared human plus robot together, working together to solve a problem,” Dr. Garapathi says. “And I think that’s the path forward to really realize new productivity improvements in revenue cycle.”

The Increasing Complexity of Human Medical Coding

“The desire to code has a good side effect,” Dr. Garapathi says. Namely, artificial intelligence can sort through those codes to feed a “learning health system” that leads to improved treatments.

But the downside of coding, when done by humans, is also evident in the widespread physician burnout often attributed to the extensive box-checking and button-clicking required of many electronic health record software systems, Dr. Ganapathi notes.

In addition, coding continues to increase in complexity because healthcare providers are trying to be more accurate in how they charge for treatments and to maximize their revenue cycle management.

“They’re trying to make it that if you do a more complex service, you actually can get paid more,” Dr. Ganapathi says. “But that means that you have to actually be able to code more accurately because you need to identify when you did something more complex, versus something simpler.”

Augmenting Human Coders Versus Machine Learning

Today, armies of human coders labor away, generating the codes that allow healthcare to properly document diagnoses and generate bills. Other armies of humans at healthcare payers then pore through those documents, in many cases seeking to clarify (but also to disallow) certain codes in the interest of controlling costs.

Both sides—two warring camps, really—already employ technology to augment their human coders. And both sides start with the doctors’ notes with those checked boxes and button clicks, weaving parallel narratives, ultimately subject to final negotiation between provider and payer.

While manual coding systems employ certain software-generated hints to augment the work of human coders, much as modern programming tools generate coding hints for programmers, Dr. Ganapathi notes “these existing techniques aren’t doing the job because we still have lots and lots of people spending lots of time.”

Machine learning, the AKASA approach, starts from a very different place: “The goal here is essentially to be able to completely code from the doctor’s notes directly, using machine learning and AI,” Dr. Ganapathi says.

“In this industry, a lot of automation doesn’t have such a tight loop with a human in the middle,” he says. Algorithms trained through machine learning often struggle when they hear “edge case” words, and fail to make sense of the narrative.

“It will just take a guess, and that might not at all be correct,” Dr. Ganapathi says. In such cases, its confidence being slow, the algorithm will send the suspect word to a human knowledgeable about medical vocabulary. If the human enters a correction, that enhances the algorithm—an example of a human augmenting machine learning, instead of vice versa.

Dr. Ganapathi likens the human-in-the-middle process to how many self-driving motor vehicles are being iterated toward completeness: “If the self-driving car sees something really unusual on the road, it might actually switch to what they call teleoperation, where there’s someone like a pilot sitting in a data center somewhere, and they can sort of take over to handle that situation.”

In the case of machine learning to enhance medical coding, split-second reaction times aren’t as necessary. It does, however, permit healthcare providers and payers to streamline coding tasks to reduce excessive costs and coding errors while deploying staff more strategically to the most complex cases that need extra attention.

Measuring the State of the Art of Machine Learning

In its study, AKASA found that RAC beat state-of-the-art machine learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LTSM) by 18.7 percent when assessing the performance of autonomous coding of complex medical notes, which are critical for billing.

“It’s important to note like for this particular problem, for medical coding, there’s some ‘interannotator agreement,’ where two coders might code differently for the same input,” Dr. Ganapathi says. “We’re probably getting quite close to being able to fully automatically code, at least for the types of data that we saw in this MIMIC-III data set.”

The MIMIC-III data set is a freely accessible medical database containing de-identified medical data of more than 40,000 patients staying in the Beth Israel Deaconess Medical Center between 2001 and 2012.

For this study, the authors extracted the discharge summaries and corresponding ICD-9 codes. With more than 47,000 samples for training, more than 1,600 samples for validation, and more than 3,300 samples for testing, the study authors had plenty of points of comparison with previous studies of earlier machine learning models, as well as certified professional coder performance.

The healthcare industry’s migration from ICD-9 to ICD-10 and ICD-11 will amplify the benefit of machine learning on coding, because these newer coding standards define significantly more codes than ICD-9 does. “As you add more codes, you have more code sparsity,” Dr. Ganapathi adds, referring to the tendency of codes to occur less frequently, since there are more of them to choose from. “One of the main improvements in our algorithm was the ability to deal with infrequently-used codes.”

The authors presented these results at the Machine Learning for Healthcare Conference in August 2021.

Decoding the Advanced Glossary of AI-Assisted Coding

Part of understanding where coding via machine learning stands is understanding the AI-driven approaches and techniques being debated.

For instance, Convolutional Neural Networks (CNN) is a mathematical technique that takes a group of inputs that are next to each other, processing them to produce some number, and then sliding over by one input and repeating the process, Garapathi explains.

“I’m just giving you the most basic version of it,” he says. “It lets you encode what’s known as ‘translational invariance.’ It lets you not have to have so many different weights. With a normal neural network, you’d have a large number of weights. When you do a convolution, you’re telling the algorithm to cut down on the number of parameters, and that there’s some sort of symmetry here that it can take advantage of.”

Long Short-Term Memory (LTSM) lets recursive neural network models very carefully decide not to change what they are retaining in their short-term memory, in order to enhance what the neural network learns and retains, Dr. Ganapathi says.

In the past three years, both CNN and LTSM have produced vast improvements in tackling the most challenging benchmark of the MIMIC-III-full-label inpatient clinical notes dataset, according to the study.

“This progress raises the fundamental question of how far automated machine learning systems are from human coders’ working performance,” the study states.

The study’s conclusion: as mentioned above, the AKASA RAC outperforms the best prior all-machine-learning benchmark by 18.7 percent, reaching past the human-level coding baseline. It points to a bright future for combined human-and-machine coding, working in a virtuous feedback loop.

How Scrambling the Doctor’s Narrative Makes Machine Learning Work Better

For years, physicians have challenged all structured efforts to replace their narratives—the traditional kind of storytelling that unstructured techniques, ranging from check boxes, button clicks, and even machine learning have aimed to match and replace.

Thus, more than a few physicians might be surprised at a machine-learning technique that actually scrambles the narrative in order to improve algorithms’ comprehension and generation of codes.

“Unlike computer vision tasks, data augmentation should be done carefully in the NLP [natural language processing] application due to the text’s structure,” the study states. “By taking into account the problem’s nature of permutation equivariance, we rely on a simple sentence permutation method. Provided that notes in the training set contain multiple sentences, they are shuffled in a random order to generate a new train sample with the same label.”

“It essentially forces the algorithm to generalize better when it sees new data,” Ganapathi elaborates. “This form of data augmentation is done in a lot of deep learning tasks.” He likens it to the way that computer vision algorithms can continue to recognize faces and objects even if their distance from the camera varies.

Overall, it’s an artificially more challenging problem, shuffling the sentences in random order, but as long as researchers make the algorithm deal with harder problems, the algorithm learns and makes better decisions.

Scott Mace
Scott Mace

Scott Mace has been writing about technology for 40 years, and about technology's role in healthcare for the past 20 years. He served as a senior editor at InfoWorld in the 1980s and 1990s, and as technology editor for HealthLeaders Media from 2012 to 2017.

He studied medical writing at UC San Diego Extension and won a Neal Award for his healthcare journalism work in 2015. (Twitter: @scottmace)