Beyond Calibration: Links to Algorithmic Fairness and Impact in Decision-Making
As machine learning is seeping into all strata of modern societies, it is increasingly used to automate decisions or draw conclusions. However, myriads of recent examples have revealed patent discrimination: Flickr’s and Google Photos’ image labeling in 2015 , recidivism risk assessment with COMPAS in 2016 , Amazon’s job application review in 2018 , Twitter’s image cropping in 2020 , DALL-E image generator in 2021 , etc. All displayed strong ethnic or gender biases. This questions the models’ quality and the evaluation methodology, opening up challenges and research directions. In this project, we focus on the probabilistic classification framework - for example, predicting the probability of a disease from a patient’s biomarkers. Much research has focused on controlling predicted probabilities on average by measuring miscalibrations: average differences between predicted probabilities and empirical outcomes. However, a classifier can be good on average while being overconfident in one subgroup and under confident in another [6-7]. This work will explore the links between local over/under confidence and fairness. It will also investigate their impact on the decision framework.