AI Policy Guide

What Is AI?

AI is difficult to conceptualize and define. How can we define artificial intelligence when we don't understand what regular intelligence is either? While Congress simplified these questions by hard coding an AI definition into law, this definition must be rooted in a practical understanding of this often hazy concept.

This part of the AI Policy Guide provides an overview of what characterizes artificial intelligence, including an explanation of the legal definition codified by Congress, the core input technologies of AI, and a high-level overview of how AI often works.

Artificial intelligence is characterized by the following:
  • The intellectual forefathers of AI framed AI as the goal of manufacturing systems that resemble the human mind. In this normative sense, AI is a goal or aspiration that guides system design. In a descriptive sense, AI is commonly referred to as a technology, a catch-all for the many technologies and designs that make AI possible. 
  • AI systems generally aim to automate intellectual tasks normally performed by humans. 
  • AI uses technologies such as machine learning. 
  • Most AI systems are best conceived of as advanced inference engines. These inferences are used to produce predictions, inform decisions, and take automated actions. 
  • AI is the result of a triad of essential inputs: software (algorithms), hardware (microchips), and data. 
  • The core advantages of AI systems are advanced automation, analytical speed, and greater scale of action. 
  • All AI systems currently in use are focused on specific applications. The pursuit of a more generalized AI is the goal of a sliver of ongoing AI research. 
  • An algorithm is simply a logical sequence of steps needed to perform a task. In computer science, algorithms are written in code. 
  • Machine learning algorithms are trained with data stored in a databank or collected in real time.


AI Basics

This section provides a basic, Level 1 understanding of AI, including AI's benefits, system flexibility, and the way AI works. 

“A fundamental problem in artificial intelligence is that nobody really knows what intelligence is.”

Shane Legg and Marcus Hutter


There is no one accepted definition of AI; there are, in fact, hundreds. For policy experts, Congress thankfully simplified definitional selection by hard coding an AI definition into law through the National Artificial Intelligence Initiative Act of 2020. Legally, AI is defined as follows: 

Machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments.Artificial intelligence systems use machine and human-based inputs to—(A) perceive real and virtual environments; (B) abstract such perceptions into models through analysis in an automated manner; and (C) use model inference to formulate options for information or action.”

This definition is quite wordy, but a few core concepts stand out. 

Intelligence. First, note that this definition does not explain the goal of this technology. The reason: the goal is in the name. As observed earlier, AI is normatively a goal enabled by a set of technologies. The bounds and aims of this goal are naturally murky because there is little consensus on what constitutes “intelligence.” Although a small slice of the field is seeking to produce human-level intelligent systems, most engineers are simply trying to automate complex tasks. Some in the field believe serious research should ignore or downplay efforts to mimic human intelligence or describe AI systems. Mimicking human intelligence nevertheless has been the goal of AI’s founding fathers, and most watershed moments in AI history, such as AlphaGo’s mastery of Go, involve outmatching human intelligence. Although defining intelligence is murky, there is no question that many AI engineers (for better or for worse) will keep some notion of human intelligence as their ultimate goal. Readers should take this approach with a grain of salt. Focusing too intently on efforts to mimic human thought can distract from the real progress and pitfalls of most systems being engineered today that do not aim to match humans. These systems are designed for a variety of tasks and applications, both big and small. Tweet Hunter’s tweet generator, for instance, has a narrow use case. It is not trying to create human intelligence; it is trying to generate human-quality tweets. Regardless of their aims or applications, modern AI systems are united by a general effort to “automate intellectual tasks normally performed by humans,” an effort naturally shaped by the application at hand and the personal views of its engineers. 

Inference. A second highlight from this definition is that machine-based systems “make predictions, recommendations or decisions.” In the field, this is referred to as inference. Inference is at the core of most if not all AI systems, and the goal of AI systems can be generalized as the goal of making good inferences. When one asks Alexa to play a song, it infers a song title based on the sound of your words converted into code such that it can compare that title against coded titles in its database, and then it picks the most likely match. Similarly, 

  • identifying a picture means inferring the correct match between the input picture and a given label, and 
  • operating a car requires thousands of near instant inferences about which actions to take in the near future, that is, predictions based on the position of the vehicle and surrounding objects. 

When these inferences trigger machine action (such as playing a song or steering a car), AI achieves the goal of automation. 

The AI Triad. A third highlight is the phrase machine-based systemsAI scholar Ben Buchanan explains that “machine learning systems use computing power to execute algorithms that learn from data.” This is the AI triad: algorithms, data, and microchips. These are the core input technologies that together enable AI. An essential theme of this introduction to AI is that each of these technologies is equally necessary because they are interdependent. Understanding this interdependence is key to designing AI policy. 


Benefits of AI 

Before diving into how AI works, one must form an idea of what AI systems offer:

  1. Automation. AI can automate new types of tasks that previously required human input. Before AI, automation was reserved for the consistent, predictable, and repetitive. AI expands automation into “fuzzy” tasks that deal with complex problems and uncertainty. With AI, automation can extend to imprecise tasks, including image recognition, speech translation, and writing. 
  2. Speed. AI can resolve complex problems nearly instantly. Driverless cars face no cognitive lag when responding to hazards. In other cases, speed can also be a hazard of its own. An extreme example lies in military systems that once granted autonomy over target engagement, allowing action before a human commander authorizes engagement. 
  3. Scale. AI can effectively perform certain tasks better than an army of humans hired for that purpose, for instance, identifying the individual preferences of millions of music listeners or TV viewers.
Confronting Barriers to Automation

This new class of automation has raised many questions of ethics, safety, and the role of government, all of which have limited autonomous system deployment. Driverless car engineers often puzzle over how driverless vehicles should confront classic trolley problem scenarios. AI safety experts often worry about determining acceptable levels of failure before deploying these systems. Meanwhile, new forms of automation can be blocked by historical laws and regulations written under the assumption of human, not machine, control.

System Flexibility

Today, nearly all AI systems could be categorized as artificial narrow intelligence, designed to perform a specific, limited function. These AI systems can perform one or a few tasks with high quality but cannot perform tasks outside their discrete training. 

AI applications range from single purpose systems, such as OpenAI’s DALL·E image generator, to complex, albeit still limited, systems, such as driverless cars. Even within these narrow domains, AI can still suffer inflexibility. The more a system can deal with the unexpected corner cases in its domain, the higher its quality. Imagine a driverless car that is highly accurate, but only when road conditions are good. A driverless car could perform perfectly in most conditions, yet when it meets the rare and unexpected situation, say a tornado, it may not know the best course of action to protect the driver. 

Although today’s AI systems are all narrow in scope, efforts are underway to develop so-called artificial general intelligence (AGI), with “the ability to achieve a variety of goals, and carry out a variety of tasks, in a variety of different contexts and environments.” This category represents the science fiction vision that many readers hold of AI. Note that generality does not imply quality. Just as a lion and a human vary wildly in intelligence, it is possible for AGI systems to perform general tasks at varying levels of proficiency. Also note that AGI does not imply human-like AI; AGI can be as advanced as humans without necessarily mimicking our cognition. A chess-playing AI, for instance, might win by mere exhaustive calculation of every combination of possible moves. Contrast this thought process with the strategic reasoning of human cognition. AGI also does not mean superintelligence, an AI system that is smarter than humans in almost every domain. These variations on advanced AI systems do not yet exist, and they represent only a fraction of AI R&D. To reiterate, most AI in use and development today does not hold these aims. Still, AGI investment is growing; a 2020 survey identified 72 active AGI R&D projects spread across 37 countries. Policymakers should take these concepts seriously even if they consider true AGI far off or impossible. Even an AI that can convincingly mimic AGI or superintelligence ought to be a matter of policy concern.


How AI Works: Prerequisites

The following sections discuss the various elements of the AI triad and the way AI works. First, several basic terms and concepts are as follows: 

  • Algorithm. “A logical sequence of steps to solve a problem or accomplish a task.” Although this term sounds like technical jargon, algorithms are everywhere. For instance, Grandma’s pot roast recipe is a type of algorithm, a list of steps that, if followed, can produce the delicious Sunday dinner. In computer science, this term is more specific, referring to the list of instructions, or code, that a computer follows. The essence is still the same; the computer follows lines of code to perform its tasks just as one might follow a recipe. The term is often used interchangeably with computer program and software

    Although this study defines algorithm in its most general sense, in the context of AI, algorithm is often used as shorthand to refer more specifically to machine learning algorithms, the processes that a computer follows to create artificially intelligent software. 
  • Model. Unlike the machine learning algorithm, the model is the software configuration that, once fed new data, can produce inferences, make predictions, and make decisions. The model is the inference algorithm, which is iteratively refined through machine learning, and thus continuously updates its configuration after processing new data. When one runs an AI system, one is running the model. 
  • Machine learning. Most AI systems today are the result of a process called machine learning. Machine learning is a method for iteratively refining the process a model uses to form inferences by feeding it stored or real-time data. This learning process is called training and is a necessary step to build artificially intelligent systems. In the Algorithms section, the way this process works is explained in greater detail.  
Just want the basics? Stop here.

That's a basic, high-level overview of AI! You can continue reading for a more advanced understanding, or skip to the next topic in the guide.

AI in Detail

This section provides a more advanced, Level 2 understanding of AI, including assessing quality and accuracy, and understanding benchmarks.

In addition to understanding what AI is and how it works, many policymakers must know how to assess it. Unfortunately, there is no one performance metric for AI models, and measurement criteria used are highly specific to each application. This study offers a starting point, describing several common metrics and the way to approach these figures with a critical eye.


Accuracy Assessments

A natural starting point for quality assessment is accuracy, which measures how a system’s inferences and actions match expectations. Accuracy is broadly useful, understandable, and often sufficient. Note, however, that perfect accuracy will rarely be possible. When deploying AI applications, engineers must actively decide upon an acceptable rate of failure, a choice based on their own reasoning, application requirements, and perhaps regulatory prescriptions. Alexa, for instance, answers incorrectly around 20 percent of the time. In Amazon’s estimation, this rate of failure is acceptable. This estimation illustrates that accuracy need not be perfect when the stakes are low.

Contrast this example with safety modules in a driverless car. In this case, many argue the acceptable level of accuracy must be higher given the danger. Safety still must balance practical considerations. Projections show that deploying a driverless car that is only 10 percent safer than one with human drivers could still save many lives; perhaps a seemingly high rate of failure might be acceptable if it still minimizes comparative risk. Other AI benefits must also be weighed against accuracy. Perhaps driverless cars could more efficiently clear traffic in the presence of ambulances, potentially saving lives. Perhaps such a benefit would justify a lower rate of overall accuracy. 


Accuracy Is Not Everything 

Accuracy, although an important metric, cannot fully assess system quality in all cases. For instance, if a deadly virus appears only once in a sample of 100 patients, a disease-spotting AI coded to always predict a negative result would still be 99 percent accurate. Although highly accurate, this system would fail its basic purpose, and the sick would go untreated. For policymakers, a critical eye is needed to ensure the numbers provide proper nuance. To gain a better sense of the quality of a system, one may need additional evaluation metrics

It is important to emphasize the fact that any metric used to evaluate a system will carry tradeoffs. As an illustration, there is often a tradeoff between measuring false positives and false negatives. Choosing which to prioritize in evaluation depends on context and systems goals. 

Returning to the disease-detecting AI example, suppose one is doing United States Agency for International Development aid work, and the chief concern is treating disease and there is no cost to treating a healthy patient. In this case, one might prioritize minimizing false negatives so one can ensure that those with the disease get treatment. Also, one might measure quality using recall, a metric that states the percentage of the model’s negative results that are true negatives. This metric would allow one to see the likelihood of a false negative, and if that probability is low, the model is effective for our purposes. 

Now imagine the reverse: suppose one is an official at the Centers for Disease Control and Prevention, and the chief concern is correctly analyzing disease transmission. In this scenario, perhaps one would want to minimize false positives by measuring with precision, a metric that evaluates how many positive results of the system are indeed positives. If precision is high, then one can be certain that one is correctly identifying positive results and can better track transmission. 

If one finds both false positives and false negatives undesirable, perhaps one wants a model that minimizes both. In this case, one would try to maximize the F1 score, which assesses how well the model minimizes both false negatives and false positives. 

These example metrics are widely used to assess AI that seeks to classify data, however, that is only one slice of evaluation and not necessarily ideal for all applications. Consider, for instance, how one might assess the quality of art generation software. Such a task is naturally fuzzy and, in many cases, might depend on the priorities or tastes of individuals; this is not something that can be easily captured in statistical metrics. A 2019 study found that for generative adversarial networks—an AI model that can serve as an AI art generator—there were at least 29 different evaluation metrics that could be used to assess the overall quality of these systems. AI evaluation metrics, like AI itself, are meaningless without application. 

The Wide Diversity of Evaluation Metrics

As mentioned, there are many metrics beyond the illustrative examples listed here. Interested policymakers can look into further evaluation metrics, including area under the curve (AUC), receiver operating characteristic curve (ROC), mean squared error (MSE), mean absolute error (MAE), and confusion matrices, among other useful metrics. Policymakers should consider their purposes, the needs of certain applications, and the metrics that are best suited for those needs.


Although evaluation metrics can usefully describe an individual model’s effectiveness, they are not suited for comparing models or tracking progress toward certain goals. As such, AI researchers have adopted a variety of benchmarks, common datasets paired with evaluation metrics that can allow researchers to compare and track results of models and determine state-of-the-art performance on a specific goal or task. These benchmarks are often tailored to specific tasks. For instance, ImageNet is a popular benchmark for assessing image detection and classification.

Although useful for tracking improvements in AI systems and the state of the art, these benchmarks can be limited in their descriptive abilities. Researchers have noted that while benchmarks are often seen as describing general AI abilities, what they actually represent is more limited in scope, measuring only a system’s ability at the tightly constrained benchmarking task. The implication is that even if an AI system is able to accurately identify most images in ImageNet’s database, that action does not necessarily mean those abilities will translate to real-time, real-world image recognition. The complexity and noise of real-world analysis can be a far cry from the limited frame of benchmarking tests. Further, it has been noted that benchmarks often fail to test necessary characteristics, such as a model’s resistance to adversarial attacks, bias, and causal reasoning.

Next up: Data

The next part of the guide explains the importance of data within any AI system.

About the Author

Matthew Mittelsteadt is a technologist and research fellow at the Mercatus Center whose work focuses on artificial intelligence policy. Prior to joining Mercatus, Matthew was a fellow at the Institute of Security, Policy, and Law where he researched AI judicial policy and AI arms control verification mechanisms. Matthew holds an MS in Cybersecurity from New York University, an MPA from Syracuse University, and a BA in both Economics and Russian Studies from St. Olaf College.

Read Matt's Substack on AI policy