Whenever I have to fill out a form with a “race” question, I’m bemused (and perhaps amused) by the options. As someone of Indian origin with a diverse family tree, I often pick the “Asian” option. However, within the US, “Asian” is generally understood to mean someone from East Asian countries, predominantly China or Japan, ignoring the Indian subcontinent and other countries that collectively make up the majority of Asia’s population. In contrast, in the UK, Asian typically refers to people from South Asia, especially those from India or Pakistan. Some forms take the issue further, adding an “ethnicity” question that divides the world into Hispanic and non-Hispanic populations. This confusion leads to the question of what, exactly, race is. Does it have a biological basis, or is it an arbitrarily defined social construct?
Many branches of science, including DNA forensics, struggle with the concepts of race, ethnicity and ancestry. There are no universally accepted definitions of these terms, and confusion about their use is prevalent in science as well as outside it. This confusion is well recognized in fields such as clinical genetics, where categorizations by race, ethnicity or ancestry are used to report health data. A study that surveyed clinical genetics professionals and researchers found that a majority of respondents considered race, ethnicity and ancestry at least somewhat important for genetic testing and communicating results to patients (1). However, there was no consensus around how these parameters should be used and what information they could convey in clinical genetics. The authors noted this lack of consensus could contribute to misleading or inconclusive results when interpreting clinical genetic data, calling out the need for standardizing race, ethnicity and ancestry in clinical data collection and interpretation.
Forensic DNA Databases
Forensic DNA analysis faces the same ambiguity when it comes to addressing race, ethnicity and ancestry. “Unfortunately, ancestry, race and ethnicity are often used interchangeably, even though each concept varies significantly across historical and contemporary national borders and time,” says Sree Kanthaswamy, PhD. Dr. Kanthaswamy is Professor of Genetics and Forensic Science/School of Mathematical and Natural Sciences at Arizona State University. Together with doctoral student Robert F. Oldt, Dr. Kanthaswamy has published a detailed study of the relevance of race in forensic DNA databases that use the Combined DNA Index System (CODIS) short tandem repeat (STR) panel (2).
These databases are used in criminal investigations as part of random match probability (RMP) calculations—the probability that a random, unrelated person from the general population has a DNA profile matching that of the suspect. Alternatively, analysts may calculate the likelihood ratio (LR), using Bayesian analysis. The problem, Dr. Kanthaswamy says, is that “RMP estimations, which are calculated from racially subdivided genetic databases, assume that ‘races’ are meaningful biological categories.” It is widely assumed that, to compute the most conservative RMP (favoring the defendant), the defendant’s self-declared racial reference DNA database should be used. When a defendant’s race is unknown, in the US, RMPs are computed across reference race-based CODIS STR allele frequency databases consisting of African American, Asian, Caucasian, Hispanic, and Native American categories. Racial information is not formally included in any steps preceding forensic DNA analysis, yet it is used for the RMP calculation and interpretation.
In their study, Oldt and Kanthaswamy evaluated the relationship between RMP and race in the continental US (2). “Based on this study,” Dr. Kanthaswamy says, “the use of racial information does little to generate conservative RMP estimates.” Therefore, using race as a proxy for genetic distinction to produce larger (i.e., conservative) RMPs for an individual DNA profile is unnecessary. He adds, “In fact, we believe that using race information at this stage of DNA analysis may be prejudicial, especially when presented to the jury during trial.”
Expanding on the results of the study, Dr. Kanthaswamy explains that the analysis of CODIS STR profiles using the five race-specific allele frequency databases could not distinguish individuals separated by race as distinct genetic clusters. “These analyses confirm that most genetic differences between individuals are only to the slightest extent attributable to racial classification, as almost 98% of the genetic variation was found to occur among individuals and not between races,” he says. Therefore, RMP values—which were computed as exceedingly small, regardless of allele frequency database—did not vary to any significant degree when incorporating race-specific reference data.
Instead of race or ethnicity, Dr. Kanthaswamy recommends using genetic ancestry in forensic DNA databases. He explains that genetic ancestry can be traced through pedigree or genealogical history. “The rigid form of categorizing people in the US into different racial or ethnic groups is based on a mixture of their physical traits, behavioral characteristics, cultural and linguistic attributes, and geographic origins,” he adds. In forensic DNA analysis, underlying biological factors that can unequivocally group people into discrete racial and ethnic categories are nonexistent. Further, he says, “When we only focus on discrete racial and ethnicity categories, we ignore the level of admixture among people of different origins.” This issue has resulted in mixed-race individuals not being clearly defined and undercounted in forensic DNA databases.
“We hope that our results highlight the clear disadvantage of using racial classifications to generate forensic identity estimations, especially in the US,” Dr. Kanthaswamy says. “A promising avenue from our findings is that eliminating racial or ethnic groups during RMP calculations may be critical for accurate and unbiased DNA analyses and interpretation.” As forensic DNA technology advances, especially the ability to extract usable DNA from ever smaller and more degraded pieces of evidence, he emphasizes the importance of using accurate and unbiased database methods.
As massively parallel sequencing (MPS) grows in popularity among forensic laboratories, it continues to revolutionize forensic workflows originally based on capillary electrophoresis. DNA phenotyping is one example of a forensic method transformed by MPS technology. This method uses a DNA profile to predict features related to physical appearance, such as eye, hair and skin color, as well as biogeographical ancestry (reviewed in 3). The technique originally analyzed a small set of autosomal single-nucleotide polymorphisms (SNPs) associated with these physical traits. With DNA microarray technology or MPS, however, the number of SNPs that can be analyzed simultaneously increases greatly (3).
Forensic DNA phenotyping has attracted considerable attention for its potential to help with investigating crime, and it has been compared favorably to eyewitness evidence. At the same time, concerns have been raised about objective interpretation of the results and whether investigations can be biased against minority racial or ethnic groups (reviewed in 4).
Parabon NanoLabs is a leading DNA technology company that offers forensic DNA phenotyping among its Snapshot® DNA analysis services. Ellen Greytak, PhD, Director of Bioinformatics at Parabon NanoLabs, addresses the issues surrounding race, ethnicity and ancestry in the context of DNA analysis. “Race and ethnicity are categorical descriptors,” she says, “and we do not use them in our reports. We talk about ancestry as a continuous measure of how much of a person’s DNA comes from each population (admixture analysis), and how similar a DNA sample is to reference individuals with known ancestry (principal components analysis).”
Dr. Greytak emphasizes that DNA phenotyping produces an objective physical representation of an unknown individual that is not biased by an observer, as may happen with eyewitness accounts. The main use for the technology is in cases where there are no eyewitness accounts. “Phenotyping has been used in many cases where detectives think they have a description of the suspect, and they would like to confirm or refute that description,” she says. In cases where eyewitness accounts are inconsistent with the DNA evidence, Dr. Greytak notes that DNA phenotyping can prevent an investigation from moving in the wrong direction. She cites an example of a 2009 homicide case in Louisiana where investigators had been searching for a Hispanic male based on the victim’s cell phone records. “DNA phenotyping analysis of DNA from the crime scene showed that it came from a Northern European male with blue eyes,” Dr. Greytak says. The investigative agency released the Parabon Snapshot® report, which ultimately led to a tip and arrest of the perpetrator.
Population genetics studies often rely on what are called ancestry-informative markers (AIMs)—a set of genetic variants that show significant frequency differences across populations. “We do not use AIMs for our ancestry analysis,” Dr. Greytak says, “but rather tens of thousands of SNPs across the genome, which makes our ancestry results far more precise than can be accomplished with AIMs.” As mentioned earlier, the reports that Parabon provides do not discuss race or ethnicity. Rather, they assess biogeographical ancestry, “which is a continuous measure of the genetic origin of one’s ancestors that is passed down through the DNA,” Dr. Greytak explains.
DNA phenotyping, as part of forensic genetic genealogy, can be a powerful tool. One area where it cannot provide information currently is age. As technology advances, however, Dr. Greytak hopes that gap may be filled by analyzing epigenetic DNA methylation patterns, using techniques such as bisulfite sequencing (5).
As with other branches of forensic science, technological advancements have led to growth in the use of forensic anthropology, especially in cases involving missing persons or unidentified human remains. Although DNA analysis has gained popularity in recent years, it is often used along with physical identification methods—such as facial imaging, bone microscopy, dental reconstruction and radioisotopic analysis—that form the foundation of forensic anthropology (6,7).
This discipline has seen its share of debates over race, ethnicity and ancestry—perhaps more so than any other branch of forensics. Anthropological assessments provide information on ancestry, and it is widely recognized that race is a social, not biological, construct. However, some forensic anthropological methods are rooted in a history of dividing humans into “racial types”, and interpretation of forensic anthropological data can still be subject to racial bias (8).
Often, forensic anthropological estimates of ancestry in the US are used as a proxy for social race. Shanna Williams, PhD, is an Associate Professor at the University of South Carolina School of Medicine—Greenville. Together with Dr. Ann Ross, she conducted an extensive content analysis of publications in the Journal of Forensic Sciences from 2009 to 2019, examining the confusion around the nomenclature used in ancestry estimation studies (9). The study also questioned the validity of the so-called tricontinental approach to categorizing ancestry information as African, Asian and European.
According to Dr. Williams, “The term ‘ancestry’ should reflect human biological variation through a lens of population structure grounded in microevolution.” The common practice of using craniometric analysis or macromorphoscopic traits (slight variations in cranial form) relies on the assumption that skeletal morphology correlates with ancestry. In this approach, statistical methods are used to analyze 2D or 3D data collected from the cranium to predict the probability of an unknown individual classifying into a known reference population. Dr. Williams emphasizes that multiple macromorphoscopic traits should be used in ancestry estimations and accessed using the appropriate statistical classification method to ensure they are properly weighted. “However, in recent years,” she says, “forensic anthropologists have begun pushing back on the utilization of such traits, questioning their biological validity.”
In the content analysis study (9), Dr. Williams found numerous examples where the term “ancestry” was applied to racially or ethnically defined samples, such as Black, white and Hispanic. She says, “This suggests, at best, a lack of consistency and consensus in the usage of the word ‘ancestry’ and, at worst, a return to the debunked biological race concept that humankind can be winnowed into a small number of discrete groups using a less socially loaded term.” Unfortunately, as the study points out, ancestry has simply become a synonym for race in forensic anthropology.
To address the issue, Dr. Williams says that “a fundamental and structural paradigm shift is needed, which represents a more biologically and socially complex approach to human population variation.” One possible solution she proposes is to apply a population structure approach to ancestry estimations. The approach would use craniometric analysis to consider how major historical events—such as population migrations, language differences and spatial patterning—are impacted by microevolutionary forces. In turn, the approach would seek to understand how these historical events influence cranial morphology. “This approach better captures the interplay between biology and culture, while steering clear of outdated typological methods constructed upon debunked notions of biological determinism,” Dr. Williams concludes.
The assumption that race can be determined from a person’s DNA or bones is pervasive, yet there is no biological foundation for that assumption. It’s clear that confusion surrounding race, ethnicity and ancestry still exists among forensic analysts. Changing misguided perceptions and bringing clarity to these issues will take time and involve a fundamental change in how we think about race.
- Popejoy, A.B. et al. (2020) Clinical genetics lacks standard definitions and protocols for the collection and use of diversity measures. Am. J. Hum. Genet. 107, 72–82.
- Oldt, R.F. and Kanthaswamy, S. (2020) Expanded CODIS STR allele frequencies—evidence for the irrelevance of race-based DNA databases. Legal Med. 42, 101642.
- Schneider, P.M. et al. (2019) The use of forensic DNA phenotyping in predicting appearance and biogeographic ancestry. Dtsch. Arztebl. Int. 116, 873–880.
- Machado, H. and Granja, R. (2020) Emerging DNA technologies and stigmatization. In Forensic Genetics in the Governance of Crime. Palgrave Pivot, Singapore, pp. 85–104.
- Forensic age estimation with DNA methylation. Application Note, Verogen Inc. Document # VD2020042 Rev. A, February 2021.
- Ubelaker, D.H. (2018) Recent advances in forensic anthropology. Forensic Sci. Res. 3(4), 275–277.
- Ubelaker, D.H. et al. Contributions of forensic anthropology to positive scientific identification: a critical review. Forensic Sci. Res. 4(1), 45–50
- Michael, A. et al. (2021) Genes, race, ancestry, and identity in forensic anthropology: historical perspectives and contemporary concerns. Forensic Genomics 1(2), 41–46.
- Ross, A.H. and Williams, S.E. (2021) Ancestry studies in forensic anthropology: back on the frontier of racism. Biology 10, 602.