At the conclusion of the 2019 calendar year, Jeff Pastor, Councilman for the City of Cincinnati, Ohio, called for a sweeping review of the city’s racial disparities in policing. Pastor implored the City for an evaluation in the wake of a much publicized report conducted by The Stanford Open Policing Project. The report found significant racial discrepancies in traffic stops in the city, including a finding that “Cincinnati police make 120% more total stops per resident in predominantly black neighborhoods than white ones.”

The call for review reinvigorated debate among both police accountability activists and public policy circles alike. A chief concern which continues to evolve, is what is to become of potentially biased police data as predictive policing and other data driven strategies become more commonplace? If police interactions are skewed, that data, once inputted in a predictive model can result in a biased output, directing police towards biased behaviors, even without the officer’s knowledge or consent. As the now cliché saying goes, “garbage in, garbage out”.

Fixes Aren’t Exactly Simple

Academics, activists, and hobbyists have already made great strides in bringing the issue of algorithmic fairness to the forefront of mainstream attention. Facial recognition proliferation has been stopped in various locales until reasonable safeguards can be developed. This came after racial disparities were publicized, in part due to A.I being trained on a homogeneous data set that did not include individuals with darker skin tones.

But creating inclusive data sets as a fix is not always applicable, especially when it comes to predictive policing based on historical police data. Creating an ethical framework is difficult. With predictive policing, it is not data scientists creating the data sets. It is not solely an issue of a set that is too homogeneous, rather the problem is that the historical data that is available is potentially tainted prior to being fed into any predictive model. A statistical model could adhere to the highest standards of scientific scrutiny, but that might not matter when individuals on the ground can introduce biased data.

Built In Expectations

This creates the issue of how much data scientists should be playing “preemptive defense”, so to speak. Some months ago, the Students of Color in Public Policy symposium was held at the UC Berkeley Goldman School of Public Policy, with the topic of Race in Artificial Intelligence as one of the headlining sessions. The panel consisted of policy experts investigating the potential harms of algorithmic unfairness and the ongoing efforts on the part of activists to make government systems more transparent.

The panel discussed a 2017 Stanford study of racial disparities in Oakland Police officer’s use of language. Researchers in this study used a computational linguistic model to identify speech patterns. The relevant point was that the model could detect with considerable accuracy, whether or not an officer was speaking to a black or white resident, based exclusively on a transcript of the conversation.

One of the data scientists on the panel suggested that such a study was a cause for optimism. Her argument was that if we could identify the rate of racial bias, as this study seemingly did, then you could use an adjustment function to underweight the score of the discriminated demographics. There are downsides, but prevailing techniques in statistical parity, “unawareness techniques” and other “fairness” strategies show promise in combating bias. In practice, however, this issue is still problematic.

Does Introducing Weights Excuse Racism?

Imagine police are using a given “hotspot” software that uses time, location and historical reports of crime as variables to forecast the risk of a future offense. One possible strategy would be to first identify the propensity for biased policing. Suppose it was determined that a particular grid of a city had a high relative density of racial minorities and was overpoliced. One might, therefore, underweight the variables associated with that city grid, suggesting that these observations in the data set are “less trustworthy”.

In a scenario where many predictor variables are used, similar to contemporary risk terrain forecasts (as opposed to the “What? Where? When?” hotspot model), differing weights could be assigned to the variables.

While there are no current examples of pre-offense predictive policing that are targeted to build a forecast for members of a particular demographic, such an idea like adjusting variable weights seems welcomed in addressing the bias in recidivism scores.

Over the years, a growing number of jurisdictions across the United States have incorporated “risk assessment scores” to aid in determining the likelihood of a criminal reoffending. Used primarily in parole or bail hearings, a 2016 ProPublica investigation found that:

In forecasting who would re-offend, the [risk assessment] algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways. The formula was particularly likely falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.

When looking at this, a huge temptation and one that could be inferred from the testimony of aforementioned panelist is, “If Demographic X is being discriminated at a known Bias Rate Y and we have a predictive model that includes demographic information, we can adjust the weight of the variables for Demographic X to compensate for the known Bias Rate Y.”

Despite the best of intentions, there is a serious risk that such a reform is implicitly accepting a degree of racism. In both cases, underweighting a city grid or underweighting the scores of a particular demographic, a degree of bias is accepted and incorporated in the model.

This might be liable to create a false sense that the issue of bias has been “solved”. This could lead to less emphasis being placed on bias training. Law enforcement or the state could feel the need to correct perceived problems with the model. Individuals involved in the justice system may feel less responsible if they think that, in the end, the model will correct for their actions.

Moreover, this introduces new ethical problems. In the case of pre-offense predictive policing, what if there are changing sentiments in law enforcement that outpace the ability of the predictive algorithm? Predictive models can be poor at accounting for drastic new changes. Suppose that in a particular instance law enforcement is more biased than the assigned weight. In that case, will the tendency be to discount that degree of bias because it does not conform to the norm? Suppose that law enforcement is less biased than the weight. In that instance, does the built-in compensation for bias become an unethical privilege to a particular demographic?

The rate of bias could also just be flat out wrong, or inapplicable. These algorithms use historic data, going as far back as ten years in some instances. It is unlikely that this data may be generalizable to any accurate degree. Is the rate of bias static across time and place? Do police in Oakland, California have the same amount of bias and against the same populations as police in El Paso, Texas? It is doubtful. Do police in Seattle in 2020 have the same amount of bias as they did in 2010 when the demographics of the city were different?

Does this then require each jurisdiction that uses algorithms in matters of justice to also quantify how biased their data is? If so, how frequently should this number be audited? If an answer is to throw out some historical data, that thereby undermines the strength of the predictive algorithms, as they require as many data points as possible to maintain accuracy.

Should We, or Should We Not Include Demographic Information?

In response to accusations of perpetuating racial bias, PredPol, one of the largest vendors of the predictive policing software, has made it a point to emphasize that they do not use certain demographic information like race (though that does little to satisfy the complaint of potential racial disparities for other reasons). This might be for the better, but the jury is still out.

Identifying racial disparities with data has shown to be useful. But, as many activists and scholars have argued, those in the tradition of Foucault’s work on governmentality chief among them, demographic data also risks putting individuals into a historical box and classifying them in bureaucratic categories. Census data for example, was used in the United States for Japanese Internment, in Nazi Germany, it was used to track down the Jewish population. Even if not employed for explicitly malevolent purposes, demographic data can still reinforce a separation. It could cement the existence of an “other”.

The question then becomes, what if bias persists, even without demographic data? If we were to eliminate that data, would we be irreparably harming our ability to identify true bias?

Conclusion

The use of algorithms in law enforcement does not appear to be a declining fad. As predictive policing in pre-offense and risk assessment continues to grow, the data science and social science field must remain vigilant. It cannot be forgotten that there are humans on the other side of the model. Data should be interrogated, best practices followed and a dialogue shared.