Genetic Genealogy for Cold Case and Active Investigations: 2021 Update

Ellen McRae Greytak, Director of Bioinformatics, Parabon Nanolabs

Share this article

Investigative genetic genealogy (IGG) has taken the forensics and law enforcement worlds by storm, closing hundreds of cases that may never have been solved otherwise. In 2019, approximately one year after IGG burst into the public eye, the Parabon team published a comprehensive paper on the current state of the art for IGG¹. Much has changed about the IGG landscape since that time, and here we update the information from that paper and cover several significant changes impacting IGG. Most notably, the 2019 paper reported 28 suspect cases that had publicly acknowledged the use of IGG for lead generation at that time. That number has now jumped to over 200, with cases being solved around the country² and around the world³ [Figure 1].

Figure 1: Cumulative number of positive identifications in Suspect (green) and Unidentified Remains (pink) cases made from Parabon IGG leads since May 2018.

While IGG has primarily been viewed as a tool of last resort for cold cases, many of the positive identifications have come on very recent cases, with more than 13% of Parabon’s identifications being delivered less than 5 years after the crime [Figure 2].

Figure 2: Number of years between crime⁴ and identification for Parabon’s solved Suspect (green) and Unidentified Remains (pink) cases.

IGG Databases

At the time of the writing of Greytak et al. (2019), the main IGG database used for law enforcement (LE) cases was GEDmatch, which had over 1 million users, openly welcomed LE usage for violent crimes (“homicide or sexual assault”) and unidentified remains and was free. The FamilyTreeDNA (FTDNA) database was of a similar size and also allowed LE usage for violent crimes (“homicide, sexual assault, or abduction”⁵) and unidentified remains, but charged $650 for each upload. The MyHeritage database allowed uploads of 3rd-party files but has never allowed LE usage (although it has been used in at least one (non-Parabon) LE case)⁶.

In early 2019, Parabon was approached by a detective investigating the brutal assault of an elderly woman in Centerville, UT. Because the victim had survived the attack, Parabon rejected the case as not meeting the definition of a violent crime for IGG. The detective reached out to GEDmatch to ask for an exception due to the brutality of the case and the urgency of finding the perpetrator. GEDmatch then directly asked Parabon to work the case, and so we proceeded with IGG analysis. This remains the only time Parabon has uploaded a sample against any site’s posted Terms of Service. The perpetrator was quickly identified and arrested, and the use of GEDmatch was made public.

Faced with mounting pressure from a small but vocal group of users, in May of 2019, GEDmatch announced that it was instituting a new system whereby users could opt-in or opt-out of LE matching, and that all users would be opted out by default. This change caused the number of matches available to LE cases to immediately drop to zero, and GEDmatch users who supported LE use of IGG had to log in to the site and manually opt-in. While the default setting for new users is now to opt-in to LE matching, the GEDmatch database has returned to only ~⅓ the size it was before the opt-out (see Supplementary File S3 of Kling et al. (2021) for Parabon’s analysis of GEDmatch match statistics before and after the opt-out⁷). Nonetheless, more than ⅔ of Parabon’s cases that resulted in an identification have been since the GEDmatch opt-out. From a practical perspective, the main impact of this change has been to make the use of FTDNA necessary for many of the more recent cases [Figure 3].

Figure 3: Proportion of Parabon’s solved and ongoing cases that have been uploaded (Y - green) or not uploaded (N - purple) to FTDNA.

At the same time as the opt-out, GEDmatch’s Terms of Service (TOS) were updated to relax the definition of violent crime to “murder, nonnegligent manslaughter, aggravated rape, robbery, or aggravated assault.”⁸ In December 2019, GEDmatch was acquired by Verogen, after which two important changes were instituted. First, GEDmatch debuted GEDmatch PRO, through which they charge $199 for each LE upload. Second, in January 2021, the TOS were updated to allow unidentified human remains (UHR) cases to access the full GEDmatch database, while suspect cases continue to be limited to the LE opt-in portion of the database.

Analysis of Highly Degraded DNA

IGG requires high-quality SNP data from hundreds of thousands of sites across the genome. Our 2019 paper showed results from hundreds of samples genotyped on a microarray and described the challenges encountered when using such technology for analysis of bone samples. Both the degradation of the DNA into short fragments and the contamination of human DNA by overwhelming numbers of microbial sequences interfere with the success of microarray genotyping.

In recent years, there has been a significant leap forward in the ability to analyze highly degraded DNA thanks to the archaeogenomics field, and ancient DNA studies are now quite common in the literature. Advancements have been made both on the laboratory side (e.g., enrichment for human DNA, repair of DNA damage, bait capture) and on the bioinformatics side (e.g., damage correction, low-coverage imputation). These advanced techniques are now being used by IGG providers to generate high-quality genome-wide SNP datasets from previously intractable forensic samples using whole-genome sequencing (WGS).

As with ancient DNA, forensic samples are challenged by the degradation of the DNA and the presence of high amounts of microbial contamination. At Parabon’s partner sequencing lab, an initial QC step evaluates the proportion of reads from each sample that align to the human genome. When that number is low, we use whole-genome enrichment to increase the alignment rate [Figure 4A].

A.

B.

Figure 4: Statistics on Parabon’s WGS cases with and without whole-genome enrichment. A) WGS alignment rate to the human genome as a function of alignment rate during the QC step. B) Call rate for IGG SNPs as a function of WGS human genome coverage after low-coverage imputation. Genome coverage is calculated as the total number of bases sequenced, divided by the size of the human genome, i.e. 10X coverage means each base has been covered an average of 10 times.

Even with enrichment, coverage remains too low for direct genotype determination; although sequencing is performed to a target coverage of 30X, sequencing reads are not evenly distributed between samples or across the genome, and many of the reads are from bacterial contamination, even after enrichment. Traditional genotype calling typically requires a site to be covered 10 times (or more) in order to confidently call genotypes. Most forensic samples do not achieve this level of coverage after a single round of sequencing, and therefore imputation is required. Imputation is the process of statistically inferring the genotypes at SNPs of interest using allelic information from SNPs in linkage disequilibrium with the target SNP. Recently, techniques specifically targeting imputation of low-coverage samples have been described in the literature, and we have implemented and validated such a pipeline. With imputation, a large proportion of the SNPs needed for IGG can be called without requiring additional sequencing [Figure 4B].

To validate the imputation pipeline, whole genome sequencing data for subject HG00119 from the 1000 Genomes Project was downloaded, which has ~5.3X coverage. This data was randomly downsampled to 2.5X, 0.5X, 0.25X, and 0.05X. Low-coverage imputation was run on each dataset using a reference panel with subject HG00119 removed. Accuracy was determined by comparing the imputed genotypes to the genotypes in the 1000 Genomes phase 3 call set. As shown in Figure 5, even at very low coverages, the vast majority of SNP genotypes can be accurately recovered by low-coverage imputation. Low coverage imputation was also applied to sequencing data from a subject with admixed ancestry from the Peruvian population (HG01983) that was downsampled to 0.05X coverage. This subject was not included in the reference panel. The call rate and error rate for this sample were consistent with the 0.05X data for the European sample. This work was presented in a poster at ISHI 2021⁹.

Figure 5: Call rate (left) and error rate (right) at each coverage level for European and Peruvian subjects.

IGG Research

One of the many challenges faced when creating IGG as a new field of business was how to price the offering. Many IGG providers offer to work on a case until it is solved, for a flat upfront fee, but do not differentiate between cases that are and are not viable for IGG. Parabon’s approach is to carefully assess each case and only move forward with those deemed potentially solvable, which covers ~85% of cases and ensures agencies do not waste their resources. The remaining 15% of cases are recommended to upload to FTDNA and have their match lists monitored on a weekly basis, so that IGG research only proceeds when enough information becomes available to make it fruitful. IGG research is offered in blocks of 15 hours, and hours are expended as sparingly as possible. At the end of the block (and often during, as well), a detailed report is written and briefed, containing concrete recommendations for the agency to follow in order to advance the case. Because of the skill and efficiency of our IGG analysts, we have found that 15 hours of IGG research is sufficient time to solve many cases, and some cases can even be solved in less time, in which case the agency is only charged for those hours [Figure 6].

Figure 6: Number of IGG research hours used in each of Parabon’s solved cases.

Kinship Testing

Additional, closer matches are often needed in order to narrow down the possible parts of the family tree from which the unidentified subject could descend. In our 2019 paper, kinship testing of family members was mentioned as part of Case Study #3, but it was not discussed in depth. Since that time, kinship testing (also referred to as “target testing” or “reference testing”) has become a critical tool used in IGG investigations, and more than one quarter of Parabon’s cases that have resulted in an identification involved at least one kinship test [Figure 7]. Many of the kinship testers turn out to have already tested their DNA, and they simply need to choose to upload to GEDmatch and/or FTDNA and opt-in to LE matching. When collection of a new sample is needed, multiple kinship testers are suggested, in case some prefer not to participate. In our experience, many family members of distant matches are willing to provide a voluntary buccal swab.

Figure 7: Number of targeted kinship tests employed in each of Parabon’s solved and ongoing cases.

While it is simple to perform a direct one-to-one kinship comparison between the kinship sample and the unidentified subject, uploading the kinship sample to GEDmatch is usually informative in order to confirm that the kinship tester has the expected relationships to the matches. It is not uncommon to find that the kinship tester does not have the relationship they expected, which can significantly alter the path of the IGG research.

Ancestry

Much has been written about the extensive coverage of Northern European families in IGG databases [e.g, Erlich et al. (2018)¹⁰], and the vast majority of cases discussed in Greytak et al. (2019) involved individuals of European descent. Since that time, many more cases involving non-European individuals have been solved, although they still remain a minority of cases that have resulted in identifications, and they are much more likely to be assessed as having insufficient match information to potentially be solvable through IGG [Figure 8].

Figure 8: Number of Parabon cases from each ancestry that have been solved, are ongoing, or were assessed as not currently workable through IGG. “Other” ancestry includes African, Asian, Middle Eastern, Native American, and admixed individuals.

Jury Convictions

In mid-2019, IGG had not yet been tested in front of a jury, with cases being closed because the suspect was already deceased or decided to plead guilty. Shortly after the publication of our paper, the first conviction by jury was handed down in a landmark case in Washington State¹¹. Notably, the defense attorney in that case elected not to challenge the IGG work during the jury trial, but rather argued that the suspect’s DNA was present at the crime scene for innocent reasons. Since that time, six additional jury convictions have been handed down on Parabon cases¹²'¹³'¹⁴'¹⁵'¹⁶'¹⁷, two on cases in which Parabon was involved but identification was made working with another genealogist¹⁸'¹⁹, and two others²⁰'²¹. To our knowledge, while at least one genealogist has been asked to testify during the jury trial, no significant challenges regarding the IGG methodology have been brought in any of these trials.

Policy

In mid-2019, there were no official policies surrounding IGG. Since that time, several sets of guidelines have been published by 1) the Department of Justice²², 2) the Scientific Working Group on DNA Analysis Methods (SWGDAM)²³, and 3) the Sacramento DA’s office²⁴. These documents overlap in many of their recommendations for best practices, although there are some important differences. Below, we list the main requirements, along with which policies (numbered 1, 2, and 3 as above) contain them:

  • For suspect cases, the DNA is already in CODIS without any probative matches [1,2,3]

○ Or the reason for not submitting to CODIS is noted [3]

  • Investigative leads have already been pursued [1,2,3]
  • If allowed in that jurisdiction, familial search of STR databases was considered first [2,3]
  • The case is in NamUS and/or ViCAP [1]
  • The case involves a violent crime or unidentified human remains [1,2,3]
  • The prosecutor is involved in suspect cases [1,2,3]
  • Only IGG sites that explicitly allow LE usage in their Terms of Service are used [1] ○ Policies should be developed that consider the TOS [2]
  • The data is identified as LE and prevented from being visible to other users [1]
  • Informed consent is obtained from voluntary kinship testers [1,3]
  • Prosecutor approval [1,3] and/or a warrant [1] are obtained prior to performing IGG on covertly-collected kinship testers
  • Arrests are not made solely on the basis of IGG but also have a direct STR match [1,2,3] ○ The agency will pursue the case and prosecution, and offender names will not be released until all parties agree [3]
  • Forensic profiles from a case are removed from IGG databases when a suspect is charged with the crime [1,2,3] ○ Kinship tester profiles are also removed [1,2]
  • Metrics on IGG usage are collected [1]

Earlier this year, two states passed policies restricting the use of IGG²⁵: starting October 1, Maryland will require approval from a judge prior to uploading data to a IGG site and will require IGGs to be certified by 2024; Montana will require investigators to obtain a search warrant to use IGG databases starting in October, although with the caveat, “unless the consumer whose information is sought previously waived the consumer's right to privacy in the information.” Both states also limit the usage of IGG on suspect cases to violent crimes, which is already a requirement of GEDmatch and FTDNA.

The New York Department of Health (NYDOH) has long had strict regulations about the analysis of DNA samples originating in the state. In 2019, New York had rules for forensic identity testing, but no specific rules around forensic lead generation. Nonetheless, Parabon was ordered to cease DNA phenotyping analysis on NY cases. Since that time, NYDOH has developed a regulatory framework for approval of firms to conduct forensic lead generation analysis, including IGG, on New York cases. A significant amount of validation and oversight are required to gain approval, and at this time, Parabon²⁶ and Bode²⁷ are the only two firms legally permitted to perform IGG on New York cases.

Conclusions

In the past three years, genetic genealogy (IGG) has exploded in popularity with law enforcement. The immense success of this technique, which has already helped solve hundreds of cold cases, has led to a number of changes on the scientific, legislative, and policy fronts. This article serves to update the reader on the details of the most significant changes since the 2019 publication of our foundational FSI paper on the topic, Genetic genealogy for cold case and active investigations. We expect more changes in the coming years, as IGG is tested in state houses and courtrooms across the country and around the world.

References

  1. Greytak, E., Moore, C., & Armentrout S. (2019). Genetic genealogy for cold case and active investigations. Forensic Science International, 299, 103–113.
  2. https://en.wikipedia.org/wiki/List_of_suspected_perpetrators_of_crimes_identified_with_GEDmatch#Parabon_Labs_assisted_identifications
  3. Tillmar, A., Fagerholm, S. A., Staaf, J., Sjölund, P., & Ansell, R. (2021). Getting the conclusive lead with investigative genetic genealogy – A successful case study of a 16 year old double murder in Sweden. Forensic Science International: Genetics, 53, 102525. https://doi.org/10.1016/j.fsigen.2021.102525
  4. Note that the nearly 90-year-old case was not a criminal case, but rather was the identification of a burial from the 1930s.
  5. https://www.familytreedna.com/legal/law-enforcement-guide
  6. St. John, P. (2020). The untold story of how the Golden State Killer was found. LA Times, December 8, 2020. https://www.latimes.com/california/story/2020-12-08/man-in-the-window
  7. Kling, D., Phillips, C., Kennett, D., & Tillmar, A. (2021). Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Science International: Genetics, 52, 102474. https://doi.org/10.1016/j.fsigen.2021.102474
  8. https://classic.gedmatch.com/Documents/tos_20210111.html
  9. Cady, J., Wilson, M., & Greytak, E. (2021). DNA Phenotyping on Ancient DNA from Egyptian Mummies. In Proceedings of the 32nd International Symposium on Human Identification. https://pub.parabon.com/Parabon-Snapshot-Scientific-Poster--ISHI-2021--DNA-Phenotyping-on-Ancient-DNA-from-Egyptian-Mummies.pdf
  10. Erlich, Y., Shor, T., Pe, I., & Carmi, S. (2018). Identity inference of genomic data using long-range familial searches. Science, 362(6415), 690–694. https://doi.org/10.1126/science.aau4832
  11. Hutton, C. (2019) Man guilty of 1987 murders solved with genetic genealogy. The Daily Herald, June 29, 2019. https://www.heraldnet.com/news/man-guilty-of-1987-murders-solved-with-genetic-genealogy/
  12. Homer, T. (2020). Michael Henslick found guilty of first degree murder. Fox Business, February 14, 2020. https://foxillinois.com/news/local/michael-henslick-found-guilty-of-first-degree-murder
  13. Griffith, K. (2020). Killer is sentenced to 115 years in prison for stabbing a college professor to death and wounding his wife during home invasion in 2011 when he was 16. Daily Mail, December 29, 2020. https://www.dailymail.co.uk/news/article-9096627/Killer-sentenced-115-years-prison-stabbing-college-professor-death.html
  14. Mehaffey, T. (2020). Jerry Burns sentenced to life for killing Michelle Martinko in 1979. The Gazette, August 7, 2020. https://www.thegazette.com/crime-courts/jerry-burns-sentenced-to-life-for-killing-michelle-martinko-in-1979/
  15. Winningham, C. (2021). Man found guilty of murder in 1984 slaying of Navy recruit Pamela Cahanes. ClickOrlando.com, May 6, 2021. https://www.clickorlando.com/news/local/2021/05/06/jury-deliberations-to-begin-in-trial-of-man-accused-of-killing-navy-recruit-in-1984/
  16. Padilla, A. (2021). Michael Whyte Sentenced To Life For 1987 Murder Of Fort Carson Soldier Darlene Krashoc. NBC Denver, June 25, 2021. https://denver.cbslocal.com/2021/06/25/michael-whyte-sentenced-1987-murder-fort-carson-soldier-darlene-krashoc/
  17. Clark, A. (2021). Vannieuwenhoven found guilty of 1976 murders. WLUC, July 27, 2021. https://www.uppermichiganssource.com/2021/07/27/vannieuwenhoven-found-guilty-1976-murders/
  18. Munoz, C. R. (2020). Jury finds Luke Fleming guilty of 1999 Sarasota murder. Herald-Tribune, February 27, 2020. https://www.heraldtribune.com/news/20200227/jury-finds-luke-fleming-guilty-of-1999-sarasota-murder
  19. Hutton, C. (2020). Suspect in 1972 murder kills himself; jury finds him guilty. The Daily Herald, November 9, 2020. https://www.heraldnet.com/news/suspect-in-1972-murder-kills-himself-jury-finds-him-guilty/
  20. Staff (2020). 'DNA don't lie' | Roy Waller, aka the NorCal Rapist, convicted of crimes. ABC10, November 18, 2020. https://www.abc10.com/article/news/crime/norcal-rapist-convictions/103-bf729327-6790-4165-b623-d8961d77e656
  21. Tipple, B. (2020). County man sentenced to maximum in cold case. Peak of Ohio, December 18, 2020. https://www.peakofohio.com/news/details.cfm?clientid=5&id=316046#.YRPtLO0pDyU
  22. https://www.justice.gov/olp/page/file/1204386/download
  23. https://1ecb9588-ea6f-4feb-971a-73265dbf079c.filesusr.com/ugd/4344b0_6cc9e7c82ccc4fc0b5d10217af64e31b.pdf
  24. https://chia187.wildapricot.org/resources/Documents/Sacramento%20County%20District%20Attorney's%20Office%20-%20IIGG%20MOU%20Example.pdf
  25. Hughes, V. (2021). Two New Laws Restrict Police Use of DNA Search Method. New York Times, May 31, 2021. https://www.nytimes.com/2021/05/31/science/dna-police-laws.html
  26. Staff (2020). DOH grants permit for advanced DNA analysis. WNYT, August 6, 2020. https://wnyt.com/news/doh-grants-permit-for-advanced-dna-analysis-parabon-nanolabs-genetic-genealogy-and-phenotyping-snapshot--dna-analysis-digital-mugshots/5818739/
  27. Singer, A. (2021). Bode Technology and Gene By Gene Receive New York State Department of Health Approval to Perform Forensic Genetic Genealogy. PRWeb, May 12, 2021. https://www.prweb.com/releases/bode_technology_and_gene_by_gene_receive_new_york_state_department_of_health_approval_to_perform_forensic_genetic_genealogy/prweb17931491.htm