Fairness for Data Analysis
Automated decisions are taking a prominent place in our societies in various domains ranging from medical diagnosis, to justice decisions or college admissions. For this to be for the better, we require these decisions to be fair and to take into account the diversity of the population. How can we rely on an automated decision if we are not guaranteed that the procedure does not introduce a bias or amplify an existing one? Unfortunately, standard artificial intelligence techniques may replicate or amplify a bias that is already present in a dataset and examples where artificial intelligence techniques naively applied led to sexist or racist decisions flourish. At the heart of artificial intelligence and data analysis lies the problem of finding a good clustering of a dataset: a partition of the data such that similar data elements are in the same part. Our goal is to design a principled approach for computing fair clusterings. To offer practical solution, this requires to come up with a context specific mathematical formalization of the problem together with new efficient algorithms tailored to artificial intelligence data.