AI and the Potential for Perpetuating Bias: Why the Judiciary Should Proceed with Caution

By: Nicole Harvey

Eleventh Circuit Judge, Kevin C. Newsom, is making waves again for his controversial view that artificial intelligence (“AI”) could—and potentially should—be used to help judges interpret terms central to legal disputes.  Judge Newsom first voiced that opinion in May of this year when he penned a concurrence in Snell v. United Specialty Insurance Co. and has revived the discussion in his recent concurrence in United States v. Deleon.

The underlying lawsuit in Snell arose when an insurance company refused to represent a policyholder in a civil suit alleging he negligently installed a ground-level trampoline in a client’s backyard.  The insurance company argued that the activity did not fall within the scope of “landscaping” for the purposes of the policy and, therefore, the company was not obligated to represent Mr. Snell in the resulting suit.  However, the policy at issue did not define “landscaping,” so it was up to the court to determine the word’s ordinary meaning and decide the case accordingly.

Judge Newsom, a self-proclaimed textualist, spent hours contemplating the ordinary meaning of “landscaping.”  He consulted multiple dictionaries to inform his opinion but was dissatisfied with the guidance they offered and turned to ChatGPT and Google’s Bard (Gemini’s predecessor) instead.  Intervening developments eliminated the need for the court to determine the ordinary meaning of the word, and the case was ultimately decided on other grounds.  However, Judge Newsom was so taken by his experience with language learning models (“LLMs”) that he decided to lay out a five point argument for their potential utility in determining the ordinary meaning of terms, which is a pivotal component of many legal disputes: (1) LLMs are trained using an enormous amount of data meant to run the gamut from “the highest-minded to the lowest,” putting LLMs in a great position to determine how all categories of individuals use a word or phrase in their everyday life; (2) LLMs understand context (e.g. LLMs can distinguish between the flying mammal called a bat and the wooden stick used to hit a baseball, that is also called a bat); (3) they are an inexpensive research tool and are highly accessible to lawyers, judges, and litigants; (4) the research methods LLMs employ to determine the ordinary meaning of terms are significantly more transparent and less discretionary than the methods used by dictionary editors, and; (5) LLMs are superior to alternative methods for determining ordinary meaning, such as conducting wide-ranging surveys.

Jude Newsom’s reasoning that LLMs may be superior to traditional dictionaries because they determine the ordinary meaning of words utilizing larger and more representative data sets than dictionary editors and with potentially less “discretion” than judges is reminiscent of the popular argument supporting the use of risk-assessment tools (“RATs”) in bail proceedings.  According to the Bail Project, RATs are “statistical models intended to help judges make detention or release decisions by attempting to predict the likelihood that a person accused of a crime will either fail to appear at a future court appearance and/or engage in criminal activity if they are released pretrial.”  In the early 2000s, RATs were heralded by reform groups as an equalizer that would help judges counteract their own biases and determine risk based on a large pool of data rather than the judge’s personal experiences alone.  However, recent studies examining RATs’ effectiveness in curbing racial disparities have some of those same groups changing course.

Concerns first arose in 2016 when a ProPublica investigation revealed that a RAT used in Florida courts overestimated the risk of recidivism among black defendants.  Since then, the concern has grown so large in the criminal justice reform community that twenty-seven prominent researchers from many of the nation’s top universities published an open letter asserting that RATs do not accurately measure pretrial risks and urged jurisdictions that utilize them to reconsider.  In their analysis, they pointed to the fact that “[d]ecades of research have shown that, for the same conduct, African American and Latinx people are more likely to be arrested, prosecuted, convicted, and sentenced to harsher punishments than their white counterparts.  Risk assessments that incorporate this distorted data will produce distorted results.”

Developers of AI, and even ChatGPT itself, admit that LLMs could perpetuate cultural and racial biases for the same reason that RATs appear to—the data sets they pull from are inherently biased. In fact, ChatGPT’s definition of “landscaping” is a great example of how such bias might produce a harmful outcome.  Urban landscaping differs from suburban landscaping in form and function. One such difference is hardscaping, which is the practice of incorporating non-plant elements into landscape design.  The in-ground trampoline at issue in Snell would be considered a hardscaping element. In cities, fences, benches, gazebos, and play structures are common hardscaping features.  In a suburban setting, stone paths, decks, and patios account for the majority of hardscaping elements.

Significantly more landscaping takes place in suburbs than in cities because suburbs have more landscapable space and fewer areas of concentrated poverty.  Therefore, the collection of data LLMs would pull from to formulate a definition of “landscaping” is likely skewed towards suburban landscaping, and the resulting definition would reflect that skew.  Sure enough, when asked to define landscaping, ChatGPT included hardscaping in its answer and provided pathways, patios, decks, and walls—all of which are more commonly found in suburban landscaping—as the only examples.

Had the Eleventh Circuit adopted this definition in Snell, they may have excluded substantial elements of urban landscaping from protection under insurance policies akin to the one at issue.  As a result, landscapers with similar policies would likely be reluctant to undertake urban landscaping.  In turn, urban access to landscaping would be further diminished, particularly in areas of concentrated poverty where minorities account for an overwhelming portion of the population.

The idea that adding more data should translate to greater representation is logical on the surface. However, as we have seen with RATs, this is not necessarily the case.  Learning from the lessons of RATs, the judiciary should proceed with caution when crediting LLMs’ proclivity for reducing bias.

 

Student Bio:  Nicole Harvey is a third-year evening student at Suffolk University Law School and a staff member for the Journal of High Technology Law.  Nicole is also an Intelligence Analyst at the Office of the Massachusetts Attorney General.  She received a Bachelor of Science degree in Politics, Philosophy, and Economics, with a concentration in Law & Justice, and a minor in Law & Public Policy from Northeastern University in 2019.

Disclaimer: The views expressed in this blog are the views of the author alone and do not represent the views of JHTL, Suffolk University Law School, or the Office of the Massachusetts Attorney General.

Print Friendly, PDF & Email