Breaking Down Barriers
How SNP Mixture Interpretation Could Transform Forensic Genetic Genealogy Eligibility
Interview written and condensed by Tara Luther, Promega
Share this article
*Image generated with the assistance of AI
For decades, the forensic DNA community has wrestled with the challenge of mixture interpretation—samples containing genetic material from more than one person. While probabilistic genotyping revolutionized how labs handle STR mixtures, the field is now confronting a new frontier: SNP mixture data. As labs increasingly turn to sequencing technologies for investigative genetic genealogy (IGG), mixture samples often hit a dead end. If a sample isn't single-source, it's typically deemed ineligible for genealogy analysis, closing the door on cases that might otherwise be solved.
Dr. Betzaida Maldonado is working to change that. A former DNA scientist at the Georgia Bureau of Investigation who now pursues her PhD in human medical and molecular genetics at the University of Colorado Anschutz Medical Campus, Maldonado is one of only two NIJ fellowship recipients focused on genetics applications. Her research, funded through the National Institute of Justice's Graduate Research Fellowship Program, tackles a critical gap: developing a probabilistic framework for interpreting two-person SNP mixtures. The work could dramatically expand which samples qualify for genetic genealogy—and potentially unlock investigative leads in cold cases that have remained unsolved for decades.
In this interview, Maldonado walks us through the technical challenges of SNP mixture deconvolution, how her approach mirrors familiar STR workflows, and what success could look like for labs navigating the shift from traditional profiling to sequencing-based technologies. For analysts working at the intersection of casework and innovation, her insights offer both practical clarity and a glimpse at where the field is headed.
Your NIJ-funded research tackles a critical limitation in forensic DNA testing—interpreting SNP mixtures. Why do you think this issue has remained under-addressed until now?
Historically, forensic DNA analysis has centered on STR data. STR profiles have become the gold standard for human identification. With increasingly sensitive chemistries, even tiny traces of DNA can create complicated profiles, particularly for evidence samples that contain the DNA of more than one individual. Therefore, most research has focused on making sense of complex STR mixtures.
Additionally, most crime labs still haven’t adopted SNP technologies. The high price of sequencing technologies and the previously unexplored utility of SNP data in forensic casework have kept labs focused on traditional STR methods. However, the growing public interest in methods that use SNP data, such as forensic genetic genealogy, combined with the need for tools that can extract more genetic information from highly degraded samples, are motivating crime labs to consider adopting SNP technologies. As sequencing tools and forensic SNP kits become more affordable and widely accepted, we are entering a new era, one where SNP technologies are more commonly used in casework. And with that comes a big question: How do we interpret SNP mixtures?
This challenge hasn’t received much attention until now. It is worth noting that probabilistic methods for interpreting STR mixtures took decades of research before they were adopted in forensic casework. Now is the time to apply that same foresight to SNP mixture data. Research conducted today will shape the future of SNP forensic data analysis.
Can you walk us through your approach to simulating SNP mixture data using the 1000 Genomes Project? What are the biggest technical hurdles?
The first step of this project involves the development of a bioinformatics framework for simulating SNP data. Our approach takes advantage of the binomial nature of SNPs, where most DNA locations have only two allele options, for example, allele A or allele B. Similar to how peak heights in STR data reflect the amount of DNA for each allele, SNP data is represented by read counts, which indicate the amount of DNA detected. In SNP mixture data, the number of reads for each allele depends on the number of contributors in the sample, their corresponding genotypes, and their mixture proportions - which is a measure of how much DNA each individual contributes to the mixture.
Using these variables, we have simulated SNP mixtures and are currently applying a likelihood model designed to infer the alleles of each contributor in two-person mixtures. Our model will be tested with simulated mixture data that mimics real forensic conditions by incorporating parameters that can account for stochastic effects and sequencing errors.
One current limitation is that our framework does not account for population-specific allele frequency differences, which can influence genotype distributions and affect mixture interpretation. To address this, our next step is to integrate data from the 1000 Genomes Project to evaluate mixtures across diverse populations and assess the accuracy of our model using data that better represents human genetic diversity.
How does your probabilistic model for deconvoluting two-person SNP mixtures compare with existing methods used for STR interpretation in crime labs?
Our probabilistic model for deconvoluting two-person mixtures shares similarities with existing STR interpretation methods used in crime labs, as both rely on likelihood-based approaches to evaluate which hypotheses are most consistent with the observed data. However, a key difference is that current STR mixture interpretation produces likelihood ratios that compare two competing hypotheses, whereas our approach focuses on inferring the most likely contributor genotypes given the mixture data. Therefore, the result is not in the form of a likelihood ratio, but rather an individual’s most probable genotype at a given number of SNPs. Another major distinction, though not specific to the method itself, is the genetic architecture of STR versus SNP data. STR loci are multi-allelic, which aids mixture interpretation, while SNPs are bi-allelic, offering only two allele options. This reduced variability makes SNP mixture interpretation more challenging. Additionally, the scale of analysis differs significantly because STR interpretation generally involves 22-26 loci, whereas SNP-based analysis requires evaluating thousands of loci. This makes computational efficiency a critical requirement for SNP mixture interpretation tools.
You’ve mentioned your desire to mirror STR workflows to ensure familiarity for forensic analysts. How do you balance innovation with procedural continuity?
This question makes me think about the balance between creativity and practicality in research. In research spaces, there’s often room for infinite innovation and creativity when proposing solutions to problems. However, forensic research requires solutions that are not only novel but also feasible and applicable in casework. I worked as a crime lab DNA scientist for 5 years, and I carry that experience into my research. Having been in their position, I approach research questions and solutions from the perspective of casework scientists. . This perspective ensures that while I explore creative solutions, I develop methods that can be implemented in crime labs and can be explained during court testimony – a key step in ensuring that scientific results from crime scene evidence are admissible in court. In short, innovation is essential, but it must be practical and have procedural continuity for it to be impactful in real-world casework.
What criteria will you use to assess the accuracy of your SNP deconvolution framework when analyzing known mixtures?
One of the most important parameters in mixture interpretation is the mixture proportion, the relative amount of DNA contributed by each individual in the mixture. In our simulation framework, we can generate mixtures using user-defined mixture proportions. Then, using our likelihood model, we will assess accuracy by comparing the true mixture proportions used to simulate the data with the mixture proportions inferred by the model. Similarly, the known genotypes of each contributor will be compared against the inferred genotypes.
Given your experience in the GBI crime lab, how do you envision the implementation of SNP mixture interpretation changing day-to-day forensic workflows?
I don’t anticipate that implementing SNP mixture interpretation will significantly change day-to-day forensic workflows in the near term. This is because most forensic workflows remain centered on STR data and STR mixture interpretation. Crime scene evidence will continue to be processed with the primary goal of generating STR profiles, which depending on the case, can be uploaded to the Combined DNA Index System (CODIS) to provide investigative leads. SNP mixture interpretation will primarily be critical in cases where all traditional forensic resources have been exhausted and the evidence qualifies for forensic genetic genealogy approaches. In those instances, mixtures could be processed through the SNP mixture workflow, and the resulting data could then be used to support genetic genealogy investigations. However, in the future, once more crime laboratories begin adopting sequencing technologies, SNP mixture interpretation will become more common, having the potential to significantly transform day-to-day forensic workflows.
Much of the current focus in forensic DNA is on single-source SNP profiles for IGG. What do you believe is the potential impact of unlocking SNP mixtures for IGG eligibility?
This question reminds me of my time in casework, when investigative genetic genealogy (IGG) approaches first gained attention after their 2018 success in identifying the Golden State Killer - a case that had puzzled investigators in California for decades. That breakthrough sparked a surge of interest from investigators, district attorneys, and detectives eager to determine whether their cold case samples were eligible for IGG. These requests were typically outsourced, meaning evidence samples were sent to external agencies for sequencing and genealogy searchers. Eligibility, however, depended heavily on sample quality, particularly whether the sample was single-source or a mixture. If it was a mixture, the sample was deemed ineligible. I remember thinking about how this excluded so many cases where IGG could have been useful and possibly the last resort for generating investigative leads. This was especially concerning to me because, due to the increased sensitivity of modern forensic kits, most evidence samples result in mixture data. Unlocking SNP mixtures for FGG eligibility would dramatically expand the number of samples that qualify for genetic genealogy, creating new opportunities to apply advanced methods to cold cases and potentially provide investigative leads in cases that have remained unresolved for decades.
Are there particular types of cases—sexual assaults, homicides, etc.—where your SNP deconvolution method could be especially transformative?
SNP deconvolution could be especially transformative for cold cases, regardless of the type of crime. However, initial applications should focus on high-yield, high-quality samples (e.g, blood, saliva, etc.). Degraded or low-level samples introduce additional interpretation challenges that must be evaluated in mock case samples before implementing this approach in routine casework.
You've intentionally chosen to study human genetics outside of a forensic program. How has this interdisciplinary training benefited your approach to forensic challenges?
My decision to pursue a PhD after working in the crime lab was one of the most difficult yet pivotal decisions in my career. What made it particularly challenging was enrolling in a program focused on human medical genetics, even though my long-term goal was forensic research. I worried that the steep learning curve in medical genetics would make transitioning into a PhD program difficult, and that the knowledge I gained in medical genetics might not translate back to forensics. However, I had one goal in mind: to expand my knowledge hoping that anything I’d learn could one day help me address the gaps and limitations in forensic genetics that I observed while working casework. Thankfully, I quickly realized that research questions in both medicine and forensic science rely on the same fundamental genetic and statistical principles, differing primarily in their application. Looking back, I am grateful that I took this risk because it expanded my expertise in genetics, bioinformatics, and statistics. I’ve been exposed to methods routinely used in human medical and population genetics. In fact, the knowledge I gained through my PhD sparked the idea of addressing the SNP mixture interpretation problem, and that knowledge was the foundation for the NIJ work I am doing today.
As one of only two NIJ fellowship recipients focused on genetics applications, what advice would you give to other forensic scientists considering translational research?
My advice is this: If there’s something you think about often while at work, perhaps you have an idea that could really make a difference in casework, you should know that your idea matters, your experience is invaluable, and you are capable of pursuing it. It’s easy to assume that translational research strictly requires a PhD, but that’s not necessarily the case. I’ve seen incredible forensic scientists who have conducted impactful research in their own labs. I would encourage forensic scientists to start sharing their ideas with leadership to explore grant opportunities to fund research projects. This can create space for scientists to step away from casework temporarily and dedicate time to research that advances both their laboratory and the broader forensic science community.
Looking ahead, what would success look like for you at the conclusion of this project—and what’s next after that?
My main goal for this project is to characterize DNA mixtures and provide a foundational framework for interpreting SNP mixture data. Developing the simulation framework is critical because it not only allows us to test the likelihood model we are building, but it also provides a tool for others to evaluate alternative models that could advance SNP mixture deconvolution. Looking ahead, success for this project means advancing our understanding of SNP mixture interpretation and laying the groundwork for future implementation in forensic casework. Ultimately, it would be ideal if, after testing our model on mock evidence samples, it could be applied in real casework.
However, many questions remain unanswered. For example, what is the maximum number of contributors that can be accurately interpreted using SNP data? How does relatedness affect genotype inference?
Research in probabilistic genotyping for STR mixtures took decades before being implemented in casework, and I expect SNP mixture interpretation will follow a similar, though hopefully shorter, trajectory. If this project moves the field closer to answering these questions and supports crime laboratories in adopting sequencing technologies, I will consider it successful.

Betzaida obtained a B.S. in Forensic and Investigative Sciences from West Virginia University and an M.S. in Forensic Science with an emphasis in DNA Analysis from Marshall University. She completed internships at the Denver Police Department and the D.C. Department of Forensic Sciences. She also worked as a Forensic DNA Analyst at the Georgia Bureau of Investigation, has served as an expert witness in the state of Georgia, and taught Forensic Science at Georgia State University. Betzaida is currently pursuing a Ph.D. in Human Genetics at the University of Colorado Anschutz. Her research interests lie at the intersection of population genetics, statistics, and bioinformatics. She is passionate about advancing forensic science through research and communicating results in ways that are accessible to the community.