This section provides a basic Level 1 explanation of AI algorithms, machine learning, and associated technology and policy.
Varieties of Machine
Learning Machine learning is a method for iteratively refining the process a model uses to form inferences by feeding it additional stored or real-time data. As a model takes in more data, the inferences should become more accurate, thus giving the impression that the machine is learning. Once inferences reach performance goals, the machine can be put to practical use, inferring on new data. Notably, models are not fixed; learning often continues after an AI model is put to practical use. This section focuses on this dominant algorithmic technique for developing AI models—machine learning. Although other tools are used to create AI models, machine learning is the basis for most, if not all, modern systems. This technique is so dominant, in fact, that the term is largely synonymous with artificial intelligence. To create an AI system, engineers must select a machine learning algorithm. The type of machine learning algorithm used must be tailored to the task at hand. Although there is no one-size-fits-all strategy, most algorithms fall into one of the following categories:
- Supervised Learning. This approach follows a guess-and-check methodology. Data are fed into the model; the model forms a trial prediction (a guess) about those data; and, critically, that result is checked against engineer-provided labels, an answer key of sorts. If the model’s prediction differs from the correct label, the model then tweaks its processes to improve inference. Successive iterations thus improve performance over time. This method is useful for well-defined objectives and for situations needing human terms and understanding. For example, supervised learning can teach algorithms to label images of fruit with their correct English name. Although useful for helping models understand data from a human perspective, this method’s challenge is that models cannot learn what they are not trained to do. Their abilities are driven, restricted, and biased by the data chosen during the training process.
- Unsupervised Learning. Unsupervised learning algorithms are used when desired outcomes are unclear. Unlike supervised learning, which learns to perform discrete and human-defined tasks, unsupervised learning takes in unlabeled data, sifts through them, learns what hidden patterns and features they contain, and then clusters this information according to found categories. This approach is useful in data analysis where humans are prone to missing important data features and overlooking unobvious correlations. Unsupervised learning benefits include looking at data through a detailed lens, doing so without many human biases and blind spots, and analyzing data with greater speed. Operating without human-provided lenses, however, can be a challenge. Although an unsupervised algorithm can categorize data, it might not understand how to define its discoveries in human terms or match them to human objectives.
- Semi-supervised Learning. Semi-supervised learning is a hybrid of supervised and unsupervised learning that combines a portion of labeled data on top of a larger amount of unlabeled data. This approach provides a light touch of supervision that can be helpful when some guidance is needed to direct the algorithm toward useful conclusions. It can be useful, for instance, when categorizing written text. The unsupervised half might first cluster the symbols by their shapes. Then to label these groupings, the AI can learn their names using a human-provided answer key. The result is an AI model that can recognize the alphabet.
- Reinforcement Learning. Reinforcement learning is driven by process rather than data analysis. These algorithms use trial and error, rather than big data, to figure out the process behind a given task. To learn, an AI agent is placed in an environment and tasked with either maximizing some value or achieving some goal. A driverless car might be tasked with minimizing travel distance between two points or maximizing fuel efficiency. The algorithm then learns through repetition and a reward signal. Through repeated trials, it tries a process and receives a reward signal if that process furthered its goal. It then adjusts its code accordingly to improve future trials. This gamified approach is useful when a general goal is known, such as maximizing distance traveled, but the precise means of achieving that goal are unknown. The challenge is that sometimes AI can cheat by following strategies misaligned with human goals. For example, if the goal were to maximize fuel efficiency of navigating a group of naval vessels to a location, perhaps an AI might choose to destroy the slowest ships to increase total naval speed. Here the AI technically finds a more efficient process yet diverges from human intention
In summary, supervised learning produces models that yield mappings between data, unsupervised learning produces models that yield classes of data, and reinforcement learning produces models that yield actions to take on the basis of data.
Learning and Inference
The following are high level illustrations of how machine learning and model inference work. In the Level 2 section, each these is presented in a more detailed yet still understandable manner.
Learning. At a high level, how do AI systems learn? To illustrate this process, examine how a supervised learning algorithm builds its intelligence. Fundamentally, this process starts with two elements, data and the model one wants to train. To kick off the process, the as-yet unintelligent model will take in one piece of data from the dataset. Although it has not yet been refined in any way at this point, the model will then attempt an initial prediction based on that data. It does so to assess how well it performs so that improvements can be made.
Once this initial prediction is made, the model then needs a benchmark to score how well it performed. There are many types of benchmarks, but in the case of supervised learning, one uses an answer key of sorts. Specifically, each data point will be given a human-provided label that represents the intended correct result. Suppose that one’s model is an image recognition system. If the training data included an image of an apple, it would be labeled with the correct term: “apple.” If the model incorrectly produced the prediction “pear,” the label would signal to the model that a mistake was made.
When the label and prediction differ, this incongruity signals to the model that it must change. Guided by a mathematic process, the model then gently tweaks certain internal settings and knobs called parameters, which are the values that shape its analytical processes. These tweaks ought to improve the model’s predictive abilities for future trials. Note that although guided by mathematics, these tweaks do not guarantee improvement.
Finally, the algorithm repeats this process on the next piece of data. With each iteration, the model tweaks its parameters with the hope that collectively, these small changes allow the model to converge on a state where it can consistently and accurately make high-quality predictions. Recall that proper training can require millions of data points and, by extension, countless rounds of training to converge on somewhat-reliable inferences.
Once the machine learning process is complete, the fully trained model can then be deployed and perform inference on real-world data that it has not seen before.
Inference. Once training is complete, how do these models perform inference on never-before-seen data? As is often the case, there are many tools that can be used. As an illustration, however, examine the most popular: the artificial neural network (figure 6.3). This work uses neural networks to illustrate AI inference because they are behind most modern AI innovations, including driverless cars, AI art, and AI-powered drug discovery. Just as machine learning has become synonymous with AI, many often treat neural networks as synonymous with machine learning. Unlike the difference between machine learning and AI, however, other approaches are still widely used and very popular. Examples include regression models, which act to map the relationship between data variables; decision trees, which seek to establish branching patterns of logic that input data can follow to reach a conclusion; and clustering algorithms, which seek to sort data into clusters based on various metrics of data similarity.
As the name implies, a neural network is an attempt to simulate the cognitive processes of the brain in digital form. These networks are composed of smaller units called artificial neurons. During the training process, each neuron will be tuned to find a unique and highly specific pattern in the input data that is highly correlative with accurate predictions. For instance, a neuron in a network designed to identify a face might be tuned to look for the visual patterns that represent a mouth, a pattern well correlated with faces. These patterns are the basis of the network’s decisions.
Key Challenges of Algorithms
An overview of model bias, explainability, and the difficulties of auditing AI.
As mentioned earlier, AI systems are not free from human biases. Although data are usually the root of many biased outcomes, model design is an often-overlooked contributing factor. The frame of the problem that engineers are trying to solve with AI, for instance, naturally shapes how the model is coded.
For example, trying to design an AI system to predict creditworthiness naturally involves a decision on what creditworthiness means and what goal this decision will further. The model’s code will reflect this choice. If a firm simply wants to categorize data, perhaps a supervised learning algorithm can be used to bucket individuals. If the firm seeks to maximize profit, perhaps a reinforcement learning algorithm could challenge the system to develop a process that maximizes returns. These differences in goals and model design decisions will naturally change outcomes and create qualitatively different AI systems. How a model is trained can also affect results. A model intended for multiple tasks has been found to show different outcomes when trained on each task separately, rather than all at once. Other such variations in design process can be expected to yield varying results.
Mitigating this form of bias can be challenging and, like data bias, lacks a silver bullet solution. Best practices are still developing, but suggestions tend to focus on process, emphasizing team diversity, stakeholder engagement, and interdisciplinary design teams.
Deep learning promotes large algorithms with opaque decision processes. Generally, as AI models balloon in size and complexity, explaining their decision-making processes grows difficult. Decisions that cannot be easily explained are referred to as black box AI. Large neural networks, and their convoluted decision paths, tend to fall into this category. As a result, interest has grown in explainable AI, a field that involves either designing inherently interpretable machine learning models whose decisions can be explained or building tools that can explain AI systems.
Some classes of inherently interpretable models exist today. For instance, decision trees, models that autonomously create “if–then” trees to categorize data, can be visually mapped for users.
This section provides a deeper, Level 2 understanding of algorithms. In particular, it discusses how an individual neuron might take in data and spot patterns within those data to produce good predictions. The general principles are illustrated by use of the common supervised learning process and the perceptron, a simple yet powerful artificial neuron model.
The diagram is of an artificial neuron. On the left, the green circles represent the input data for analysis. On the right, the black arrow represents the final prediction that the model will output for the user. The core magic of this model, however, is the center. There, one finds several elements that, while perhaps complex looking at first, are relatively simple in operation.
An example follows.
Start at the far left with the blue data inputs. For this example, suppose one operates a bank and is trying to train an algorithm to categorize loan applicants as either prime or subprime borrowers. Now suppose the applicants must submit four categories of data:
- Whether they hold a savings account, represented by a 1 (yes) or a 0 (no)
- Their number of dependents
- Their number of monthly bank deposits
- Their income bracket, represented by 1–7, with 7 being the highest
For this illustration, suppose that the loan application for the neuron to analyze is as follows:
- 1. Savings account: 1
- Number of dependents: 0
- Number of monthly deposits: 2
- Income bracket: 7
Data Adjustment and Activation
Detecting patterns in data is actually a process of transforming input data into an output that represents a meaningful pattern. This is done in two steps. First, the neuron manipulates the input data to amplify the most important information and sums the data together. Next, it passes this sum to an activation function. In a realistic sense the activation function represents the rules that transform the input data into the output decision. In many cases, however, it can more or less be thought of as an algorithmic trigger that needs to be tripped for the neuron to activate.157 The activation function compares the manipulated data to certain criteria, which dictate the final output that the neuron will produce. In our simple prime-or-subprime case, this criterion is a threshold number: If the sum is higher than this threshold, the neuron sends a result indicating that this is a prime borrower. If not, it indicates subprime. Although in this case this result is the neuron’s final decision, note that in complex neural networks this result might just be one of many patterns identified in service of the final decision.
Elements of an Artificial Neuron
Next, examine the tools that this neuron uses to adjust the data and calculate the final result. Surprisingly, this can be quite simple. In many cases, the math involved uses only simple arithmetic.
Once the data enter the neuron, they encounter the green squares in figure 6.5; these represent a weight. Using weights, the neuron can amplify a certain element of the input data through multiplication. For instance, it is likely that the income bracket data in this example is strongly correlated with prime borrowers; therefore, this feature of the data should be amplified in the final decision. To do so, one multiplies that value by a weight to make it bigger, giving it more significance.
Weights are a useful tool because they allow the truly important elements of the data to have an outsized effect on the result. Crucially, weights are a parameter that can also be tuned. The more important the value, the bigger a weight multiplier it will receive. Conversely, unimportant data can be eliminated by multiplying them by 0. Finding the correct weightings of data values can be seen as one of the core elements of a neuron’s intelligence.
After the data have been weighted, they are added to a bias value.The bias acts as the threshold, mentioned previously, that the weighted data must surpass for the neuron to activate. Put another way, the bias puts a thumb on the scale of the result by adjusting what causes the neuron to trigger. For instance, if prime borrowers should be rare, one might subtract a bias value, making it harder for the summed weighted data to trip the activation function.
After the data have been adjusted, they are then fed to the activation function. In the example neuron’s case, if the final value adds up to 1 or greater, the neuron communicates a prime result; if not, it indicates subprime.
Calculation of the Result
This section puts together each element to see how it affects the data. As mentioned earlier, to produce a result, the neuron will simply take the input data—the loan application—multiply each category by its weight, and add these results together with the bias value. In this case, start by weighting the data. The data values are in blue, their weights in green, the bias in purple, and their sum in red:
Each of data category is multiplied by a weight consistent with the importance of that data element in making final predictions. Run the data through this equation: