# Tuesday, June 22

## Invited Speakers: Tuesday, June 22

• 10:00 - 11:00. Silvia Chiappa (DeepMind, United Kingdom)
Causality and Fairness
In this talk I will introduce causality tools for formalizing, measuring and addressing fairness issues. I will show how these tools can provide us with a sophisticated yet simple and intuitive framework for reasoning about fairness and lead to techniques for addressing fairness issues in complex scenarios.
• 11:00 - 12:00. Nicolas Schreuder (CREST - ENSAE Paris, France)
A minimax framework for quantifying risk-fairness trade-off in regression
A theoretical framework is proposed for the problem of learning a real-valued function which meets fairness requirements. This framework is built upon the notion of $\alpha$-relative (fairness) improvement of the regression function which we introduce using the theory of optimal transport. Setting $\alpha = 0$ corresponds to the regression problem under the Demographic Parity constraint, while $\alpha = 1$ corresponds to the classical regression problem without any constraints. For $\alpha \in (0, 1)$ the proposed framework allows to continuously interpolate between these two extreme cases and to study partially fair predictors. Within this framework we precisely quantify the cost in risk induced by the introduction of the fairness constraint. We put forward a statistical minimax setup and derive a general problem-dependent lower bound on the risk of any estimator satisfying $\alpha$-relative improvement constraint. We illustrate our framework on a model of linear regression with Gaussian design and systematic group-dependent bias, deriving matching (up to absolute constants) upper and lower bounds on the minimax risk under the introduced constraint. This talk is based on a joint work with Evgenii Chzhen, see [arXiv:2007.14265].
• 15:00 - 16:00. Maryam Negahbani (Dartmouth College, United States)
Fair algorithms for clustering
Many important decisions today are made with the help of machine learning algorithms. These range from showing advertisements to customers, to awarding home loans, and to predicting recidivism. It is important to ensure that such algorithms are not biased towards or against a specific race, gender, ethnicity etc. Clustering is a learning method that is used widely, either stand-alone or as a subroutine, in all the mentioned applications. For a given dataset and a parameter k, the goal of clustering is to partition the data into k groups based on some similarity measure. Our notion of fair clustering is based on the Disparate Impact (DI) doctrine: any “protected class” must have approximately equal representation in the decisions taken (by an algorithm). In clustering, this translates to ensuring that each protected group is fairly represented in each cluster. This notion was first introduced in the seminal work of Chierichetti et al. (NIPS 2017). In this talk, I will present our paper on fair algorithms for clustering, published in NeurIPS 2019. In this paper, we introduce a post-processing algorithm that makes any p-norm clustering fair. This includes the popular k-means, along with k-median and k-center objectives. Our framework can also be tuned to set bounds on how much each protected group should be represented in a cluster. Given any clustering (which might be extremely unfair) we prove that the points can be re-assigned to clusters in such a way to ensure fairness without incurring too much clustering cost. Then, we prove it is NP-hard to solve the re-assignment problem and provide an algorithm that can “approximate” the solution. In the sense that, our re-assignment might violate fairness by a small amount: e.g. In any cluster and for any group, there might be a few people more than that cluster’s quota for that group. This is a joint work with S. K. Bera, D. Chakrabarty, and N. J. Flores.
• 16:00 - 17:00. Novi Quadrianto (University of Sussex, United Kingdom)
Learning with hidden subgroups
In many real-world classification tasks, each labeled class consists of multiple semantically distinct subclasses, or subgroups. For example, the "cat" class label can have finer-grained intra-class variations, such as "cat indoor" and "cat outdoor". This finer-grained subgrouping information is typically unavailable or unlabeled. The standard training process of machine learning models has two inter-connected challenges: a) hidden stratification, and b) residual bias. In hidden stratification, classifiers often underperform on important hidden subgroups. In residual bias, systematic bias affects whether or not entire collections of data points appear in the training dataset, and can make the classifier unprepared for treating those subgroups in the eventual deployment setting. In this talk, I will present some of our recent works on learning invariant representations and active learning across subgroups for addressing hidden stratification and residual bias.