May 8, 2020

Biased AI is More Than a Technical Problem

Building a Process-oriented Policy Approach to AI Governance

Anne Hobson

Public Policy Manager, AR/VR, Facebook

Walter Stover

Python Developer

Artificial Intelligence (AI) systems have grown more prominent in both their use and their unintended effects. Just last month, LAPD announced that they would end their use of a predicting policing system known as PredPol, which had sustained criticism for reinforcing policing practices that disproportionately affect minorities. Such incidents of machine learning algorithms producing unintentionally biased outcomes have prompted calls for ‘ethical AI’. However, this approach focuses on technical fixes to AI, and ignores two crucial components of undesired outcomes: the subjectivity of data fed into and out of AI systems, and the interaction between actors who must interpret that data. When considering regulation on artificial intelligence, policymakers, companies, and other organizations using AI should therefore focus less on the algorithms and more on data and how it flows between actors to reduce risk of misdiagnosing AI systems. To be sure, applying an ethical AI framework is better than discounting ethics all together, but an approach that focuses on the interaction between human and data processes is a better foundation for AI policy.

The fundamental mistake underlying the ethical AI framework is that it treats biased outcomes as a purely technical problem. If this was true, then fixing the algorithm is an effective solution, because the outcome is purely defined by the tools applied. In the case of landing a man on the moon, for instance, we can tweak the telemetry of the rocket with well-defined physical principles until the man is on the moon. In the case of biased social outcomes, the problem is not well-defined. Who decides what an appropriate level of policing is for minorities? What sentence lengths are appropriate for which groups of individuals? What is an acceptable level of bias? An AI is simply a tool that transforms input data into output data, but it’s people that give meaning to data at both steps in context of their understanding of these questions and what appropriate measures of such outcomes are.

The Austrian school of economics is well-suited to helping us grapple with these kinds of less well-defined problems. Austrian economists levied a similar critique against mainstream economics, which treated economic outcomes as a technical problem to be solved with specific technical decisions. The Austrians stressed a principle of methodological individualism, which holds that socioeconomic outcomes are ultimately the products of individual decisions, and cannot be acted on directly by technocratic policymakers. Methodological individualism involves the recognition that individuals drive outcomes in two primary aspects: subjective interpretation of their environment, and through interaction with each other and that same environment. We can sum up application of these two aspects to AI systems in two questions: who gets the data, and where does the data go?

It matters who gets the data because the necessity of subjective interpretation will lead different people to reach separate conclusions about the same data. As an example, a set of data on financial variables such as defaults and debt repayment frequency combined with personal characteristics such as race and geographic locations may lead one person to label African-Americans as larger credit risks. Other individuals reading the same data, however, may arrive at a different conclusion: the patterns in this data stem from structural racism that has suppressed income of African American households compared to other households, and do not indicate that they are inherently riskier. The first interpretation would result in biased outcomes from an AI system used to generate predictions of credit risk based on that data, whereas the second interpretation might actually result in beneficial outcomes; for instance, an agency might offer with more lenient terms to these individuals.

The second question of where data goes depends on the interaction of individuals with each other and their environment, which drives the flow of data and also determines how that data is acted upon. In her book Weapons of Math Destruction, Cathy O’Neil offers a perfect example of this when analyzing what went wrong with the LAPD’s use of PredPol, which took in data on past crimes and used it to predict the geographic location of new crimes. Police forces took this data and increased their presence in hot spots of predicted crime, which resulted in a positive feedback loop of more crime data originating in that area (because of increased interaction between police officers and residents of that neighborhood in the form of increased arrests) generating more predictions of crime, leading to over-policing of minority groups. Ultimately, the data went to a police department that unintentionally increased arrests of minority groups.

Together, the subjectivity of data and the importance of interaction get at a core insight of Austrian economics that directly follows the principle of methodological individualism: context matters. If how data is interpreted and used differs from person to person, then the flows of data matter in who gets the data first and how they use it, potentially transforming the data before sending it on. Thinking along these lines shifts us away from focusing on building better, more ethical AI, and more towards trying to better understand the dynamics of data within a system: who is selecting which data to feed into an AI, what data the AI then generates, and most importantly, how that data is then acted upon and by whom. If we don’t take these matters into consideration, we risk myopically focusing on fixes to the AI that will not change outcomes. In the case of PredPol, for example, the AI could have been completely transparent, but the outcome would have been the same because of how police officers were acting on the output data according to their institutional context.

Some experts are already calling for more process-oriented AI governance approaches, including the EU’s High-Level Expert Group on AI and professional services network KPMGCarolyn Herzog, general counsel and chair of an ethics working group, comes close to the approach we are advocating in stressing that “…data is the lifeblood of AI,” and that we must pay attention to “…issues of how that data is being collected, how it is being used, and how it is being safeguarded.” However, at present, this data-oriented approach is not represented clearly in U.S. policy. Recent AI policy movements, including ethical principles released by the Department of Defense and the Office of Management and Budget’s AI Guidelines, are a good first step but still emphasize the technology more than the data flows, and are limited to the government’s use of AI. Principle 9 of the guidelines, for instance, notes the importance of having controls to ensure “…confidentiality, integrity, and availability of the information stored, processed, and transmitted by AI systems,” but does not extend this to explicitly consider how the data is used after being transmitted.

Moreover, these proposals do not coherently lay out the relationship between data and AI outcomes because they do not give enough emphasis on where data goes and how it is used in context after being transmitted from the AI system. Returning to our earlier point, interactions matter. Take PredPol as an example. Even if we know how data was being collected, stored, and used by PredPol, and by the police department, these two pieces in isolation are not enough to understand the emergent outcome that results from the interaction between these two organizations. The critical driver is the feedback loop that emerges because of the data flows back and forth between PredPol and the police department. Current policy proposals risk overlooking this class of emergent AI outcomes by narrowly focusing on the AI and data practices of just one organization, rather than explicitly drawing our attention to how data circulates in the wider data ecosystem.

What’s needed is a process-oriented, systemic policy approach focused not just on AI, but how data is interpreted and used in context by individuals and organizations on the ground, and how these parties interact with each other. The NTIA would be a good convener for drafting this framework given their success in leading a multi-stakeholder process to build a framework for enhancing cybersecurity. NTIA can use the AI Now Institute’s algorithmic impact assessment as a blueprint. By building a voluntary framework for AI outcomes, the NTIA can serve a dual purpose. First, it can help ease worries over how to stay compliant with best practices; Second, it can help organizations safeguard against unwanted outcomes of AI systems, and more effectively identify and correct problems that do arise instead of depending on outside forensic data analysis after the fact. NTIA can help establish a common language of AI systems between public and private entities that gives concrete steps organizations can take to avoid these outcomes.