Research shows decision-making AI could be made more accurate when judging humans

A new study from researchers at the University of Toronto and the Massachusetts Institute of Technology (MIT) is challenging conventional wisdom on human-computer interaction and reducing bias in AI.

The paper, which was published this month in the journal Science Advances, demonstrates empirical evidence on the relationship between the methods used to label the data that trains machine learning (ML) models and the performance of those models when applying norms.

MIT PhD student Aparna Balagopalan, a graduate of U of T's masters program in applied computing, is lead author, with co-authors Gillian Hadfield, director of U of T’s Schwartz Reisman Institute for Technology (SRI), Schwartz Reisman Chair in Technology and Society, CIFAR AI Chair, and a professor of law and strategic management in the Faculty of Law; David Madras, a PhD student in the Machine Learning Group at the department of computer science in the Faculty of Arts & Science and the Vector Institute; research assistant David H. Yang, a graduate student in the applied computing program in the Faculty of Arts & Science; Marzyeh Ghassemi, a faculty affiliate at SRI and an assistant professor at MIT; and Dylan Hadfield-Menell, an assistant professor at MIT.

Much of the scholarship in this area presumes that calibrating AI behaviour to human conventions requires value-neutral, observational data from which AI can best reason toward sound normative conclusions. But the new research suggests that labels explicitly reflecting value judgments, rather than the facts used to reach those judgments, might yield ML models that assess rule adherence and rule violation in a manner that humans would deem acceptable.

To reach this conclusion, the authors conducted experiments to see how individuals behaved when asked to provide factual assessments as opposed to when asked to judge whether a rule had been followed.

From left to right: MIT PhD student Aparna Balagopalan, SRI Director Gillian Hadfield and SRI Faculty Affiliate Marzyeh Ghassemi (supplied photos)

For example, one group of participants was asked to label dogs that exhibited certain characteristics – namely, those that were large, not well groomed, or aggressive. Meanwhile, another group of participants was instead asked whether or not the dogs shown to them violate a building pet code predicated on the same characteristics, rather than assessing the presence or absence of specific features.

The first group was asked to make a factual assessment – and the second, a normative one.

Hadfield says the researchers were surprised by the findings.

“When you ask people a normative question, they answer it differently than when you ask them a factual question,” she says.

Human participants in the experiments were more likely to recognize (and label) a factual feature than the violation of an explicit rule predicated on the factual feature.

Current thinking on this topic presumes that calibrating AI behaviour to human conventions requires value-neutral, observational data from which AI can best reason toward sound normative conclusions.

But this new research suggests that labelling data with labels that explicitly reflect value judgments, rather than the facts used to reach those judgments, might yield ML models that assess rule-adherence and rule-violation in a manner that we humans would deem acceptable.

The results of these experiments showed that ML models trained on normative labels achieve higher accuracy in predicting human normative judgments. Essentially, they are better at predicting. Therefore, if we train automated judgment systems on factual labels – which is how several existing systems are being built – they are likely overpredicting rule violations.

The implications of the research are significant. Not only does it show that reasoning about norms is qualitatively different from reasoning about facts, but it also has important real-world ramifications.

“People could say, ‘I don’t want to be judged by a machine – I want to be judged by a human,’ given that we’ve got evidence to show that the machine will not judge them properly,” Hadfield says.

“Our research shows that this factor has a bigger effect [on an ML model’s performance] than things like model architecture, label noise and subsampling – factors that are often looked to for errors in prediction.”

Ensuring that the data used to train decision-making ML algorithms mirrors the results of human judgment – rather than simple factual observation – is no small feat. Proposed approaches include ensuring that the training data used to reproduce human assessments is collected in an appropriate context.

To this end, the authors of the paper recommend that the creators of trained models and of datasets supplement those products with clear descriptions of the approaches used to tag the data – taking special care to establish whether the tags relate to facts perceived or judgments applied.

“We need to train on and evaluate normative labels. We have to pay the money for normative labels, and probably for specific applications. We should be a lot better at documenting that labelling practice. Otherwise, it’s not a fair judgment system,” Hadfield says.

“There’s a ton more research we need to be doing on this.”

The study was funded by the Schwartz Reisman Institute for Technology and Society and the Vector Institute, among others.