Results for ""
Researchers at MIT experimented to determine the influence biassed AI recommendations have on emergency decisions, with participants responding to mental health crises by calling for medical or police assistance.
It is common knowledge that individuals have prejudices, some of which may be excruciatingly covert. For example, the average individual could believe that computers, which are often made of silicon, steel, glass, plastic, and other metals, are impartial. While that presumption may apply to computer hardware, it may not always apply to computer software written by imperfect humans and may be given information that is sometimes compromised.
For example, machine learning-based artificial intelligence (AI) systems are increasingly being used in medicine to diagnose particular diseases and assess X-rays. These systems also support other aspects of healthcare decision-making. But as recent research has demonstrated, machine learning models are capable of encoding prejudices against minority groupings, and as a result, the suggestions they generate may do the same.
AI models employed in medicine can be inaccurate and inconsistent, partly because the data used to train the models is only sometimes representative of real-world situations. Different X-ray machines, for example, can record things differently and produce additional findings. Furthermore, models trained primarily on white people may not be as accurate when applied to other groups. The Communications Medicine study is not concerned with such difficulties but with problems caused by biases and techniques to alleviate the adverse outcomes.
Experiment
An experiment including 954 persons (438 doctors and 516 nonexperts) was conducted to determine how AI biases affect decision-making. The participants were shown call summaries from a fictional crisis hotline, each depicting a male experiencing a mental health crisis. The summary included information such as whether the subject was Caucasian or African American and his religion if he was Muslim. A typical call narrative would consist of an African American guy being found in a delirious state at home, adding that "he has not consumed any drugs or alcohol since he is a practising Muslim." Participants in the study were urged to call the police if they suspected the patient would become aggressive; otherwise, they were encouraged to seek medical attention.
The four other groups in the experiment received recommendations from either unbiased or biased models, and The researchers conveyed those recommendations in either "prescriptive" or "descriptive" formats. For example, in a circumstance involving an African American or Muslim, a biased model would be more likely to suggest calling the police than an impartial model. The study participants were unaware of the model type or the likelihood of bias. Prescriptive guidance clearly outlines what a person should do, instructing them to call the police in one situation or seek medical assistance in another. Descriptive advice is less blunt: If the AI system believes a modest danger of violence is linked with a specific call, no flag is displayed; otherwise, a flag is raised.
Conclusion
The participants "were highly influenced by prescriptive recommendations from a biassed AI system," the scientists said, summarising the experiment's main finding. However, they also discovered that "participants were able to retain their initial, unbiased decision-making by employing descriptive rather than prescriptive recommendations." In other words, the researchers can reduce the bias built into an AI model by adequately structuring the advice. So why do the results differ depending on how the recommendation is presented? There is very little space for question when someone is commanded to do anything, like phone the police, says Adam. But when the circumstance is presented, whether or not a flag is present, "it gives leeway for a participant's interpretation; it allows them to be more flexible and assess the issue for themselves," according to the study.
According to the researchers, the language models that are often employed to provide recommendations are shown to be easily biased. For example, a class of machine learning systems known as language models is trained on text, such as the entirety of Wikipedia and other online content. However, these models can be easily biased when "fine-tuned" by using a considerably smaller subset of data for training purposes—only 2,000 phrases, as opposed to 8 million web pages.
Third, the MIT researchers found that recommendations made by biased algorithms can nevertheless deceive decision-makers even if they are themselves objective. The presence or absence of medical training did not significantly alter participants' replies. The authors concluded that "clinicians were as much affected by biased models as non-experts were."