Defining Algorithmic Bias

- 3 mins

We don’t have a good definition of algorithmic bias. Some say it is simply when algorithms’ training data differs from a true distribution. Others argue algorithmic bias requires systematic intent from the algorithm creators to disadvantage certain groups. Both of these are too narrow of a definition. Algorithmic bias is the use of any algorithm that results in a degraded experience for a group of people due to a factor unrelated to the algorithm’s purpose. Current definitions fail to encompass all algorithmic bias for 2 reasons: assumptions that algorithmic bias is only a data problem and definitions of algorithmic bias that require biased intent.

Algorithmic Bias is Not Simply Biased Data

Machine learning communities dismiss algorithmic bias as a simple issue where the training data is not representative of the true population. For example, perhaps we are training Alexa’s voice recognition and we ended up using only voices from upper-middle class white men in the training set. This could (and did) result in Alexa not recognizing voices from other speakers. This type of biased training data is a big problem but it’s not the only problem under the umbrella of algorithmic bias. Defining algorithmic bias as solely “unrepresentative training data” fails to account for the many situations where training data is representative but biased by societal decisions. For example, COMPAS is a risk assessment software that informs judges about a criminal’s predicted future offenses. Judges all over the United States use this software to help determine sentencing length and parole eligibility of criminals. However, COMPAS is biased against African-Americans. COMPAS is far more likely to rate African-Americans as likely reoffenders even if this isn’t borne out by the data afterwards. COMPAS is a case where the training data being unrepresentative isn’t the only issue to tackle. A bigger problem with COMPAS is that it embodies the societal biases that judges and juries pass on to African-American defendants within the court system. The problem may be that COMPAS is actually too representative of our legal system. COMPAS learned that defendants of color are judged more harshly and therefore predict that their chance of serious re-offenses is more likely.. Algorithmic bias is not solely a problem with the data.

Algorithmic Bias is Often Unintentional

Academics such as Friedman and Nissenbaum (2001) argue that algorithmic bias pertains to “computer systems that systematically and unfairly discriminate against certain individuals or groups of individuals in favor of others”. While Friedman and Nissenbaum’s work was prescient in many ways, their definition of algorithmic bias here fails for one reason. “Systematically” implies an intentional and methodical process that creators engage in to disadvantage certain users. This definition was written in a time where machine learning was not as widely deployed as today. Friedman and Nissenbaum’s examples are systems like SABRE that were intentionally biased by programmers towards a specific outcome. Such intentional biasing is likely less common today than the unintentional biases in every intelligent system. Algorithmic bias is not often systematic, but it is often systemic.

A Working Definition

Algorithmic bias is the use of any algorithm that results in a degraded experience for a group of people due to a factor unrelated to the algorithm’s purpose. This broadens the above two definitions to include other phenomena that would fall under colloquial definitions of algorithmic bias but not under the definitions proposed by others. For example, a passport photo service that would not let a user progress and kept informing them to “open their eyes” when the user uploaded a picture of themselves. This problem cannot be explicitly attributed to biased data or programmer intent and thus would not fall into the above definitions. However, this user (and likely many others) had a degraded experience due to their facial features that are characteristic of their heritage. The user’s facial features should really have no impact on whether or not they receive a passport and thus this is also a factor unrelated to the algorithm’s purpose. Together these two criteria are met and thus this situation falls under the definition of algorithmic bias in this article. Does this mean algorithmic bias is ubiquitous in every system? It just might.

Aaron Springer

Aaron Springer

Human-Centered AI Researcher, Quantitative UX Researcher at Google

rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora