Blog Listing

Opening the Black Box of Sentencing Software

By Yina Moe-Lange  


Tackling the black box private software solutions currently used in courts today is an important task for Machine Learning (ML) researchers. Last year, ProPublica published their findings of an investigation into commercial sentencing software in the US. This commercial software was found to have a severe bias against African-Americans. The predictions of this software were far off reality. As the ProPublica article states, “we know the computer algorithm got it exactly backward,” and “the algorithm was somewhat more accurate than a coin flip.”


There has been a large, complex debate about the use of algorithms in predicting recidivism rates in the US court system. One such proprietary algorithm is called COMPAS. This algorithm, like others similar to it, is privately owned and basically a black box, which is cause for concern, since nobody knows what the decision-making process actually is. The COMPAS predictions are only accurate 60-70% of the time and worsened by data entry errors that can lead to incorrect decisions being drawn.


A group at Duke University are looking to change this. They are working on creating a “white box” algorithm that is transparent and easier to interpret. The method they used is called the Supersparse Linear Integer Model (SLIM). SLIM is simple and adaptable, allowing for the model to be used to predict arrest likelihoods for different types of crime.


This table shows the SLIM scoring system for the prediction of an arrest for a drug related offense. Taken from “Interpretable Classification Models for Recidivism Prediction” by Zeng, Ustun and Rudin, Fig. 6.


The Duke University team has worked on a second algorithm, CORELS. This algorithm compares data about new offenders to past offenders with similar characteristics. They are then split into different “buckets” of risk categories. Testing the algorithm on a dataset of defendants from Florida, the CORELS algorithm correctly differentiated between low and high-risk offenders just as well or better than the black box models.


This table shows the SLIM scoring system for the prediction of an arrest for a general violence offense. Taken from “Interpretable Classification Models for Recidivism Prediction” by Zeng, Ustun and Rudin, Fig. 7.


Separately, the National Bureau of Economic Research (NBER) has published a working paper on the role of ML in judges’ decision making. A quick summary of their policy simulation results shows that “crime can be reduced by up to 24.8% with no change in jailing rates, or jail populations can be reduced by 42.0% with no increase in crime rates.”


What is important about these new algorithms is that there is little demographic data used. For instance, the NBER study only uses age. The algorithm makes decisions based on records of the current case and any past criminal records. Neither of the Duke University models use race or socioeconomic status. Additionally, because models like SLIM are based on open source software, anyone can examine the code and the publicly available data used.


Using a potentially biased black box software is not good for society and implementing white box algorithms can improve the sentencing system, even reducing the systemic bias that is inherent in a human run judicial system. The main conclusion drawn from analysis of these systems is that it is important to understand the connection between predictions and decisions. Deciphering the black box of ML models has become exponentially harder with the advent of technologies such as Deep Learning.  The explosion in complexity and application of ML has made it more urgent to shed light on the black box.