Medicine, AI, and Bias: Will Bad Data Undermine Good Tech?

Imagine walking into the Library of Congress, with its millions of books, and having the goal of reading them all. Impossible, right? Even if you could read every word of every work, you wouldn’t be able to retain or comprehend everything — even if you spent a lifetime trying.

Now let’s say you somehow had a super-powered brain capable of reading and understanding all that information. You would still have a problem: You wouldn’t know what wasn’t covered in those books — what questions they’d failed to answer, whose experiences they’d left out.

Similarly, today’s clinicians have a staggering amount of data to sift through. Pubmed alone contains more than 34 million citations. And that’s just the peer-reviewed stuff. Millions more data sets explore how factors like bloodwork, medical and family history, genetics, and socioeconomic traits impact patient outcomes.

Artificial intelligence (AI) lets us use more of this material than ever. Emerging models can quickly and accurately synthesize enormous amounts of data, predicting potential patient outcomes and helping doctors make calls about treatments or preventive care.

Predictive algorithms hold great promise. Some can diagnose breast cancer with a higher rate of accuracy than pathologists. Other AI tools are already in use in medical settings, allowing doctors to more quickly look up a patient’s medical history or improve their ability to analyze radiology images.

However, some experts in the field of artificial intelligence in medicine (AIM) suggest that while the benefits seem obvious, lesser noticed biases can undermine these technologies. In fact, they caution that biases can lead to ineffective or even harmful decision-making in patient care.

New Tools, Same Biases?

While many people associate “bias” with personal, ethnic, or racial prejudice, broadly defined, bias is a tendency to lean in a certain direction, either in favor of or against a particular thing.

In a statistical sense, bias occurs when data does not fully or accurately represent the population it is intended to model. This can happen from having poor data at the start, or it can occur when data from one population is errantly applied to another.

Both types of bias — statistical and racial/ethnic — exist within medical literature. Some populations have been studied more, while others are under-represented. Which raises the question: If we build AI models from the existing information, are we just passing old problems on to new technology?

“Well, that is definitely a concern,” says David M. Kent, MD, CM, MS, director of the Predictive Analytics and Comparative Effectiveness Center at Tufts Medical Center.

In a new study, Kent and a team of researchers examined 104 clinical predictive models for cardiovascular disease — models designed to guide clinical decision making in cardiovascular disease prevention. The researchers wanted to know whether the models, which had previously performed accurately, would do as well when tested on a new set of patients.

Their findings?

The models “did worse than people would expect,” Kent says. They were not always able to discern high-risk from low-risk patients. At times, the tools over- or underestimated the patient’s risk of disease. Alarmingly, most models had the potential to cause harm if used in a real clinical setting.

Why was there such a difference in the models’ performance from their original tests compared to now? Statistical bias.

“Predictive models don’t generalize as well as people think they generalize,” Kent says. When you move a model from one database to another, or when things change over time (from one decade to another) or space (one city to another) — the model fails to capture those differences.

That creates statistical bias. As a result, the model no longer represents the new population of patients, and it may not work as well.

That doesn’t mean AI shouldn’t be used in healthcare, Kent says. But it does show why human oversight is so important. “The study does not show that these models are especially bad,” says Kent. “It highlights a general vulnerability of models trying to predict absolute risk. It shows that better auditing and updating of models is needed.”

Though, even human supervision has its limits, as researchers caution in a new paper arguing in favor of a standardized process. Without such a framework, we can only find the bias we think to look for, the researchers note. Again, we don’t know what we don’t know.

Bias in “The Black Box”

Race is a mixture of physical, behavioral, and cultural attributes. It is an essential variable in healthcare. However, race is a complicated concept and problems can arise when using race in predictive algorithms. While there are health differences among racial groups, it cannot be assumed that all people in a group will have the same health outcome.

David S. Jones, MD, PhD, professor of culture and medicine at Harvard University and coauthor of Hidden in Plain Sight — Reconsidering the Use of Race Correction in Algorithms, pointed out, “A lot of these tools [analog algorithms] seem to be directing healthcare resources toward white people.” Around the same time, similar biases in AI tools were being identified by researchers Ziad Obermeyer, MD, and Eric Topol, MD.

The lack of diversity in clinical studies that influence patient care has long been a concern. A concern now, Jones says, is that using these studies to build predictive models not only passes on those biases, but also makes them more obscure and harder to detect.

Before the dawn of AI, analog algorithms were the only clinical option. These types of predictive models are hand-calculated instead of automatic.

“When using an analog model,” Jones says, “a person can easily look at the information and know exactly what patient information, like race, has been included or not included.”

Now, with machine learning tools, the algorithm may be proprietary — meaning the data is hidden from the user and can’t be changed. It’s a black box . That’s a problem because the user, a care provider, might not know what patient information was included, or how that information might affect the AI’s recommendations.

“If we are using race in medicine, it needs to be totally transparent so we can understand and make reasoned judgments about whether the use is appropriate,” Jones says. “The questions that need to be answered are: How, and where, to use race labels so they do good without doing harm.”

Should You Be Concerned About AI in Clinical Care?

Despite the flood of AI research, most clinical models have yet to be adopted in real-life care. However, if you are concerned about your provider’s use of technology or race, Jones suggests being proactive. You can ask the provider: “Are there ways in which your treatment of me is based on your understanding of my race or ethnicity?” This can open up dialogue about the provider’s decision-making process.

Meanwhile, the consensus among experts is that problems related to statistical and racial bias within AIM do exist and need to be addressed before the tools are put to widespread use.

“The real danger is having tons of money being poured into new companies that are creating prediction models who are under pressure for a good ROI,” Kent says. “That could create conflicts to disseminate models that may not be ready or sufficiently tested, which may make the quality of care worse instead of better.”

For now, AI researchers say more standardization and oversight need to be established, and that communication between institutions conducting research for patient care needs to be improved. But how all of this should be done is still up for debate.

Sources

David M. Kent, MD, CM, MS, director of the Predictive Analytics and Comparative Effectiveness Center at Tufts Medical Center

David S. Jones, M.D., Ph.D., professor of Culture and Medicine at Harvard University

JAMA. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. doi:10.1001/jama.2017.14585

Circulation: Cardiovascular Quality and Outcomes (2022). Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models. https://doi.org/10.1161/CIRCOUTCOMES.121.008487

ACM Digital Library (2021). MedKnowts: Unified documentation and information retrieval for electronic health records. https://doi.org/10.1145/3472749.3474814

Lancet (2021). Artificial intelligence, bias, and patients’ perspectives. https://doi.org/10.1016/S0140-6736(21)01152-1

The Lancet Digital Health (2020). Artificial intelligence in medical imaging: switching form radiographic pathological data to clinically meaningful endpoints. https://doi.org/10.1016/S2589-7500(20)30160-6

The New England Journal of Medicine (2020). Hidden in plain sight-reconsidering the use of race correction in clinical algorithms. DOI: 10.1056/NEJMms2004740

Source: Read Full Article