Microscopic Hair Comparison and the Sociology of Science

Nearly a year ago, an FBI press release generated a great deal of media attention. The release reports initial results from a joint project of the Department of Justice (DOJ), the Federal Bureau of Investigation (FBI), the Innocence Project (IP), and the National Association of Criminal Defense Lawyers (NACDL) created to review cases in which a forensic technique known as microscopic hair comparison was used. This alliance of at least somewhat strange bedfellows—the IP and NACDL are often critical of the forensic science practiced by the FBI and used in court by the DOJ—was convened after the Washington Post reported on three cases of wrongful conviction in the District of Columbia, each of which relied heavily on microscopic hair comparison. The groups then agreed to undertake a comprehensive, joint review of the FBI’s deployment of microscopic hair comparison.

A single strand of hair rests on a smartphone. Chandler Abraham, Flickr CC https://flic.kr/p/7pdd4T
A single strand of hair rests on a smartphone. Chandler Abraham, Flickr CC

The press release reports the results of this panel’s initial analysis of almost 500 cases. Most startlingly, it reports that FBI examiners gave inaccurate testimony in 96% of those cases. The DOJ is now working to notify all the defendants affected. The NACDL is trying to ensure that those defendants have counsel. For their part, the FBI has agreed to provide free DNA testing, and the DOJ has agreed not to invoke statutes of limitations. However, that will not necessarily apply to the majority of cases, which originated in state, not federal, courts.

That staggering 96% stat is likely behind some of the rather sensational headlines that accompanied blog posts about the report, including “The FBI faked an entire field of forensic science” and “CSI is a Lie.” One might read such reports and wonder how it is even possible to be wrong at a rate so much higher than chance. As it turns out, knowing a little more, well, context, makes the story a little more understandable—if no less damning.

The popular view still has it that jury trials are the way criminal prosecutions are decided, but well over 90% of all convictions come from plea bargains that are never subject to public scrutiny. Instead, behind closed doors, the prosecution typically confronts the accused with what is claimed to be overwhelming and incontrovertible evidence (not just forensic evidence—DNA matches, blood samples, bitemarks, fingerprints, bullet casings, etc.—but also incriminating statements by the defendant and others, eyewitness and informant testimony, and so on), then “bargains” with the accused to get him or her to plead guilty to a lesser charge. A guilty plea garnered in this fashion avoids a costly and time-consuming jury trial, but it also means that the evidence-claims rarely make it into broad daylight. High-profile cases may get jury trials, such as O.J. Simpson’s, or media coverage, such as that surrounding the Grand Jury decision to not pursue an indictment in Ferguson, Missouri after Michael Brown’s death, but such saturated media coverage only serves to reinforce the impression that juries review evidence. Moreover, the extraordinary popularity of Law and Order-type television series, from Perry Mason to the full panoply of CSI spinoff productions to Bones all reinforce the completely false impression that juries typically review evidence in criminal cases. In one of the only empirical studies done on this topic, the ironic finding is that those convicted are more convinced of the invincibility of scientific evidence against them than are the prosecutors. That is, even if the evidence is flawed (or non-existent), the claim to that evidence is a powerful weapon in the hands of prosecutors in their leveraging of a guilty plea.

The branding of CSI is taken to its limit at this CSI-themed cafe. Yumi Kimura, Flickr CC
The branding of CSI is taken to its limit at this CSI-themed cafe. Yumi Kimura, Flickr CC

And, of course, the race and class dimensions of both criminal trials and plea bargains are well known. For example, the ACLU published a 2013 report using massive national data to document the extent to which the War on Drugs targeted African Americans, at four times that of Whites. For example, although Whites consume either the same or more marijuana in the relevant age groups (17-29) than do African Americans, in major urban areas, the latter are seven times more likely to be arrested for possession. With the dominance of the plea bargain as a weapon in the hands of prosecutors, the accused typically plead guilty to a lesser charge rather than mount a challenge against a looming mandatory-minimum prison sentence. No other factor explains the huge race differential in the nation’s incarceration rates.

Microscopic hair comparison is basically what it sounds like: a forensic analyst compares one or more hairs relevant to a crime. One or more of the hairs is unknown, and one or more are known to come from a specific person. For example, the analyst might be asked to compare pubic hairs from a rape kit (that appear not to come from the victim) to pubic hairs plucked (by court order, if necessary) from a suspect. Human hair varies in a number of characteristics, such as color, treatment, pigment aggregation, and shaft form. Although all hairs from a single anatomical site on a single individual are certainly not identical, they tend to be “consistent” in some of these characteristics. Thus, the analyst seeks to determine whether the suspect’s sample hairs are consistent in these characteristics with the unknown hairs.

These characteristics, however, are certainly not uniquely possessed. Millions of people may have hairs of a certain color or thickness. Even combining a number of characteristics does not reduce the potential pool of donors of the hair to a single person. Faced with a finding of “consistency,” then, we must ask what “weight” (as forensic statisticians would call it) or “probative value” (as lawyers would call it) should we assign to this finding? Answering this question requires information about the rarity of the various characteristics being considered. As a simple example, we already have intuitive, experience-based information that a finding of consistency of the color red should be assigned greater weight than a finding of consistency of the color black. There are fewer natural redheads in the world than there are people with naturally black hair.

Here is where the story takes its first odd turn: As a 2009 review of forensic science by the National Research Council (NRC) put it, “No scientifically accepted statistics exist about the frequency with which particular characteristics of hair are distributed in the population.” Without such information, the weight of microscopic hair comparison cannot be estimated. Did FBI hair analysts write reports and give testimony stating that the weight of microscopic hair comparison cannot be calculated? No. In most cases, they devised verbal characterizations of the weight of the evidence, including verbal characterizations of probability (“I would say there is a high degree of probability that the hair comes from the defendant”), invoking professional experience (“based on my experience in the laboratory and having done 16,000 hair examinations, my opinion is that those hairs came from the defendant”), or characterizing rarity by reference to professional experience (“In 12 years as a hair analyst, I have looked at hairs from around 10,000 people, and only on two occasions have I seen hairs from two different people that I could not distinguish”). As late as 2004, forensic hair analysts, including FBI hair analysts, defended the practice of testifying “that the likelihood of finding someone else with indistinguishable hair is remote, a rare event.” They attributed criticism of this practice to “confusion and misunderstanding. . . and an incomplete knowledge about forensic hair comparisons by the non-scientific members of the legal system and non-forensic scientists.”

Here, of course, is how the whole episode illustrates a systemic failure of the criminal justice system that transcends the specifics of microscopic hair comparison. FBI analysts without sufficient data to estimate the weight of their evidence resorted to vague but overstated verbal formulations of certainty. These analysts trained local analysts all over the U.S. to do the same thing. Some prosecutors, in their summations, interpreted these overstated verbal formulations to sound like powerful statements of guilt. Defense attorneys often failed to understand or expose the limitations of the testimony. And, perhaps most importantly, the courts often failed to act as effective gatekeepers to ensure that expert witnesses could support their claims.

Although legal standards for allowing expert testimony differ from state to state, almost all such standards impose at least some “gatekeeping” responsibility on the judge to ensure that potentially misleading evidence is not put before a jury. Many states require “general acceptance in the field” in which the evidence belongs (the “Frye standard”), and still more require evidence of reliability (the “Daubert/Kumho standard”). Still, while microscopic hair comparison might have problems under either standard, most courts have allowed expert testimony by FBI examiners and the examiners they trained. Further, both standards tend to distract courts from the key issue of whether the weight of the evidence is being properly stated, regardless of the technique’s general acceptance or reliability.

By 2012, however, the FBI had become convinced that it was improper to try to characterize the weight of hair comparison evidence without studies or data, and that reports or testimony that did so were improper.

In other words, the FBI’s belief about what constituted proper scientific interpretation of microscopic hair comparison evidence changed sometime between 2000 and 2012. The FBI found itself in a consensus position with its erstwhile adversaries, the Innocence Project and NACDL.

This, then, is what accounts for the seemingly startling 96% figure. By changing its mind about what counted as “accurate,” the FBI caused thousands of scientific reports to seemingly magically transform from “accurate” (in its view) to “inaccurate.” Essentially all microscopic hair comparison evidence was considered inaccurate, because no one has enough information to properly estimate the weight of microscopic hair comparison evidence.

Although it is not known to have contributed to as many wrongful convictions as microscopic hair comparison, for fingerprint identification, too, forensic analysts do not report the rarity of the characteristics under consideration. Vince Alongi, Flickr CC
Although it is not known to have contributed to as many wrongful convictions as microscopic hair comparison, for fingerprint identification, too, forensic analysts do not report the rarity of the characteristics under consideration. Vince Alongi, Flickr CC

In a way, this story is perhaps a particularly stark example of what is by now a rather garden-variety finding in sociology of science: that scientific knowledge changes through the social consensus of its practitioners. By deliberately forming an organized consensus group and charging themselves with issuing a report, the four institutions (DOJ, FBI, IP, NACDL) whose cooperation, as Norman Reimer, Executive Director of NACDL, put it, was “once an almost inconceivable concept,” were able to transform thousands of scientific results from “accurate” to “inaccurate.” Now the view that the ordinary way of testifying was inappropriate was no longer a product of “confusion and misunderstanding” and “incomplete knowledge,” but, rather, the correct “scientific” view.

The science didn’t “shift.” In fact, no scientific research was performed at all—that, after all, is the point. What happened was that relevant actors became convinced that is was not scientifically acceptable to report about a hair comparison without making a reliable estimate of the potential donor population. In essence, the relevant social actors agreed it was necessary to think about hair evidence in a probabilistic fashion. This formation of agreement could reasonably be called both a scientific act and a social one.

Vince Alongi, Flickr CC.
Vince Alongi, Flickr CC.

Sociologists might also want to answer what is perhaps a more interesting question: why the FBI would have entered into this consensus in the first place. The FBI is not exactly known for its receptiveness to criticism of its forensic practices. And yet, the 2015 hair comparison press release is notable for its self-flagellating tone. The FBI’s own press release quotes IP Co-Director Peter Neufeld saying, “These findings confirm that FBI microscopic hair analysts committed widespread, systematic error, grossly exaggerating the significance of their data under oath with the consequence of unfairly bolstering the prosecutions’ case,” and “this epic miscarriage of justice calls for a rigorous review to determine how this started almost four decades ago and why it took so long to come to light.” It quotes Reimer saying, “it seems certain that there will be many whose liberty was deprived and lives destroyed by prosecutorial reliance on this flawed, albeit highly persuasive evidence.” Compare that tone with the more characteristically stonewalling tone of the FBI’s 2005 press release announcing its discontinuation of comparative bullet lead analysis (CBLA): “While the FBI Laboratory still firmly supports the scientific foundation of bullet lead analysis, given the costs of maintaining the equipment, the resources necessary to do the examination, and its relative probative value, the FBI Laboratory has decided that it will no longer conduct this exam.”

What explains the FBI’s willingness to throw the “science” of microscopic hair comparison under the bus? A number of possible explanations present themselves. Certainly, we should not discount the agency and hard work of the actors who created this consensus group with the FBI and insisted on moving it forward: the IP and NACDL, the Post (which continued to press the story), and, of course, progressive forces within DOJ and the FBI. Nor should we discount genuine motivation to do right. The results of this report may cause enormous amounts of work for law enforcement and attorneys as they try to sort through thousands of closed cases to determine whether erroneous forensic evidence made a difference in those cases. Changes in leadership at the level of President of the United States, Attorney General, or FBI Director could have had an impact. The bipartisan turn against overpunitiveness in the U.S. is another possible reason for turning against microscopic hair comparison.

There is another obvious explanation: microscopic hair comparison is almost obsolete. As the NRC report noted: “The availability of DNA analysis has lessened the reliance on hair examination. In a very high proportion of cases involving hair evidence, DNA can be extracted, even years after the crime has been committed. Although the DNA extraction may consist of only mitochondrial DNA (mtDNA) (nuclear DNA, preferable for forensic analysis, is not always retrievable from hair; mtDNA usually is), such analyses are likely to be much more specific than those conducted on the physical features of hair. For this reason, cases that might have relied heavily on hair examinations have been subjected more recently to additional analyses using DNA. Because of the inherent limitations of hair comparisons and the availability of higher-quality and higher-accuracy analyses based on mtDNA, traditional hair examinations may be presented less often as evidence in the future, although microscopic comparison of physical features will continue to be useful for determining which hairs are sufficiently similar to merit comparisons with DNA analysis and for excluding suspects and assisting in criminal investigations.”

We certainly don’t mean to valorize DNA analysis or claim that it is devoid of social issues. As social scientists (ourselves included) have pointed out, there are a host of concerns raised by DNA profiling, including contamination, planting, errors of interpretation, categorization of databases by racial groups, familial searching, phenotypic profiling, as well as privacy, surveillance, discrimination, and civil liberties concerns raised by the expansion of genetic databases.

Most important, even if microscopic hair comparison continues to be used for the sorts of coarse screening purposes described by the NRC (such as distinguishing hairs from fibers and human from animal hair), with the spread of DNA analysis, hair analysts are now less likely to be called upon to give evidence of identity than they were prior to 2000. That is, even if hair comparison is used as an investigation tool, testimony about identity—the kind of testimony that is the target of the joint report—is much less likely to find its way into trials. Under these circumstances, it is difficult not to suspect the FBI gave up hair comparison when it no longer needed it for criminal prosecutions and trials.

Indeed, we would argue that the CBLA story also supports this hypothesis. CBLA was not obsolete in the same sense as microscopic hair comparison; it has not been replaced by a superior technology. But CBLA was an exotic forensic technique used by only one laboratory (the FBI) in the U.S. and in only a small number of criminal cases. Discarding it would affect those cases but do little to change the overall landscape of crime investigation. Since data were not available to estimate the weight of the evidence for CBLA either, discontinuing the technique was easier than undertaking the difficult work of putting CBLA on a firm foundation of data and statistical inference.

Following an opinion handed down in the D.C. Court of Appeals in early 2016, the U.S. Attorney's Office declined to comment to The Washington Post as to how often the conclusions in bullet fragment analysis may have been overstated as absolute rather than accurate "to a reasonable degree of scientific certainty." West Midlands Police, Flickr CC
Following an opinion handed down in the D.C. Court of Appeals in early 2016, the U.S. Attorney’s Office declined to comment to The Washington Post as to how often the conclusions in bullet fragment analysis may have been overstated as absolute rather than accurate “to a reasonable degree of scientific certainty.” West Midlands Police, Flickr CC

The fall of a contested forensic discipline may seem like progress to criminal justice reformers who have been trying to improve American forensic science for years. But which disciplines fall may have as much to do with their perceived utility in the crime investigations of the future as with the inherent weaknesses of the disciplines themselves.

The 2009 NRC report was very critical of the state of forensic science, and it recommended Congress establish a National Institute of Forensic Science. However, after the 2010 elections shifted control of the House of Representatives to the Republicans, it became clear that any request for funding for such an institute would be blocked. So President Obama and his Attorney General, Eric Holder, decided to pursue a compromise or interim solution: the appointment of a National Commission on Forensic Science. This Commission held its first meeting in early 2014 and was comprised mainly of criminal justice career professionals, including judges, attorneys, and laboratory scientists. Among the appointed members, only one was a social scientist (another ex officio member is a physical anthropologist).

At the outset, the Commission faces two huge hurdles that could thwart its ability to effect meaningful reform. First, it will be challenging for those who work inside the institutional and organizational framework of criminal justice to effect reform from within. Second, as noted, long before issues of bias, accuracy, and validity in forensic science get to the laboratories or are subject to challenge, the overwhelming reliance on plea bargains means that more than 90% of those incarcerated got there through plea bargains rather than trials. The forensics never even came before a jury.

Recommended Readings

Spencer S. Hsu. 2012.”Convicted Defendants Left Uninformed of Forensic Flaws Found by Justice Department,” Washington Post (April 16). The first in the Post’s series about wrongful convictions that prompted the historic review of microscopic hair comparison.

Helena Machado and Barbara Prainsack. 2012. Tracing Technologies: Prisoners’ Views in the Era of CSI. Farnham, U.K.: Ashgate. An innovative comparative study of Austrian and Portuguese prisoners that finds a high degree of belief in the power of DNA profiling and other forensic technologies.

National Research Council. 2009. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: National Research Council. The landmark report finding much forensic science poorly validated and critiquing the judiciary’s failure to ensure validation.

Daniel Nohrstedt and Christopher M. Weible. 2010. “The Logic of Policy Change after Crisis: Proximity and Subsystem Interaction,” Risk, Hazards & Crisis in Public Policy 1(2):1-32. A helpful work from the policy process tradition that argues “most policies …cannot be changed from within.”

Jed S. Rakoff. 2014. “Why Innocent People Plead Guilty,” New York Review of Books (November 14). A critique of the U.S. criminal justice system’s reliance on plea bargaining by a federal judge and member of the National Commission on Forensic Science.

Clive A. Stafford Smith and Patrick D. Goodman. 1996. “Forensic Hair Comparison Analysis: Nineteenth Century Science or Twentieth Century Snake Oil?Columbia Human Rights Law Review 27:227-291. One of the earliest critiques of the validity of microscopic hair comparison.