How Do Our Phenotype Prediction Models Hold Up in Admixed Populations?

Nicole M.M. Novroski, PhD, Assistant Professor and Forensic Geneticist, Forensic Science Program, Department of Anthropology, University of Toronto Mississauga

Share this article

Forensic DNA phenotyping (FDP) is becoming an increasingly popular tool within criminal and civil investigations. When little to no information about an individual is known, the generation of a FDP profile has the potential to provide insight into what externally visible characteristics (EVCs) an individual may possess [1]. Specifically, many FDP tools focus on skin pigmentation, hair color, eye color and freckling and associate a confidence, or prediction accuracy, for every EVC reported in the phenotypic profile. While FDP models have demonstrated an acceptable level of accuracy in discrete populations, very few studies focus on how the prediction of hair color, eye color and skin pigmentation may be impacted if an individual has admixed ancestry.

Our lab performed a preliminary evaluation of the ForenSeq DNA Signature Prep Kit for phenotype prediction accuracy in a subset of first-generation or second-generation admixed Canadian individuals ( n=100), where participants provided a DNA sample and self-identified for hair color, eye color, and ancestry on their mother’s and father’s side [4]. While there exist many tools for modeling phenotypic traits, the ForenSeq DNA Signature Prep Kit is a commercially-available multiplex that has been validated for forensic casework in many laboratories worldwide. The ForenSeq Signature DNA Prep Kit is a multiplex developed for use with the Forensic Genomics (FGx) massively parallel sequencing platform and includes 27 global autosomal STRs, 24 Y-STRs, 7 X-STRs, 94 identity SNPs (iiSNPs), 22 phenotypic SNPs (piSNPs) and 56 biogeographical ancestry SNPs (aiSNPs) [2]. In contrast to STRs, which were selected in part due to their presence in non-coding regions of the human genome, SNPs used to infer ancestral and phenotypic information can often be found within or near coding regions. As an example, 11 of the piSNPs included in the ForenSeq DNA Signature Prep Kit are located in the melanocortin 1 receptor (MC1R) gene, which has vast implications in human pigmentation [3]. Specifically, variants of the MC1R gene are known to cause loss of function in eumelanin production, and haplotypes in this gene have been strongly associated with red hair color. Conversely, normal function of the MC1R gene typically indicates dark pigmentation and aids in the prediction of black and brown hair color. The remaining piSNPs included in the ForenSeq panel are located in other key pigmentation genes, including HERC2, OCA2, SLC45A2, TYR, TYRP1, IRF4, and EXOC2. HERC2, which is located upstream of pigmentation gene OCA2, is thought to be a regulator for OCA2. SNP haplotypes across these two genes have been found to predict blue and brown eye color extremely well when variants are in their homozygous state. Phenotype prediction panels such as IrisPlex utilize rs12913832, located in the HERC2 gene, as the major branch point between light and dark iris pigmentation [1,3].

Using the 22 piSNPs and two common SNPs within the DPMB (DNA Primer Mix B) of the ForenSeq DNA Signature Prep Kit, phenotype predictions were generated by the ForenSeq Universal Analysis Software (UAS). The phenotype predictions then were interpreted and each sample was assigned a predicted hair color (blonde, red, black, or brown) and a predicted eye color (brown, blue, or intermediate). The ForenSeq UAS provides phenotype predictions as numerical results out of a maximum value of 1.0 in each trait. For example, Figure 1 illustrates how phenotype prediction estimates are displayed by the ForenSeq UAS. It is commonplace to use a cut-off of 0.70 to assign hair and eye color predictions. We decided, however, to evaluate the prediction metrics with and without a threshold value.

Figure 1. An example of a phenotype prediction estimate from an individual of admixed ancestry using the ForenSeq UAS. If a threshold of 0.7 were used, this individual would have a predicted hair color of undetermined and a predicted eye color of blue. If no threshold is used, this individual would have a predicted hair color of brown and a predicted eye color of blue.

When comparing phenotype predictions against the self-identified EVCs provided by each participant, we established some baseline designations. A correct prediction designation was applied when the self-identified phenotype and predicted phenotype generated by the ForenSeq UAS were concordant. An incorrect prediction designation was applied when the self-identified phenotype and predicted phenotype generated by the ForenSeq UAS were discordant. We found that, when no interpretation threshold was utilized, 77% of eye color predictions were correct (Figure 2), while only 47% of hair color predictions were correct (Figure 3).

Figure 2. Summary of correct and incorrect eye color predictions as percentages when no interpretation threshold was applied. Incorrect predictions are displayed to the right (smaller pie chart) and are grouped based on their true phenotype to demonstrate which eye color is most often incorrectly predicted.

Figure 3. Summary of correct and incorrect hair color predictions as percentages when no interpretation threshold was applied. Incorrect predictions are displayed to the right (smaller pie chart) and are grouped based on their true phenotype to demonstrate which eye color is most often incorrectly predicted.

While the use of an interpretation threshold did raise the percentage of correct phenotype calls for both hair and eye color, we found that many of the same patterns (i.e., difficulty predicting intermediate eye color; black versus brown hair color) were consistent with and without a threshold applied.

Overall, our preliminary findings highlight that FDP predictive power remains challenging for individuals of admixed ancestry, when compared to individuals whose ancestry is more homogenous in nature. While the field continues to develop refined panels and approaches to predicting and modeling EVCs, it is our belief that admixed samples (which comprise a growing number of individuals in the global population) will continue to remain challenging under our current prediction constructs. We look forward to exploring additional forensic tools for EVC prediction in admixed samples in the coming months and publishing our findings accordingly.

References

  1. S. Walsh, F. Liu, K.N. Ballantyne, M. Van Oven, O. Lao, M. Kayser, IrisPlex: A sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information, Forensic Sci. Int. Genet. 5 (3) (2010). https://doi.org/10.1016/j.fsigen.2010.02.004.
  2. A.C. Jäger, M.L. Alvarez, C.P. Davis, E. Guzmán, Y. Han, L. Way, P. Walichiewicz, D. Silva, N. Pham, G. Caves, J. Bruand, F. Schlesinger, S.J.K.Pond, J. Varlaro, K.M. Stephens, C.L.Holt, Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories, Forensic Sci. Int. Genet. 28 (2017) 52-70. https://doi.org/10.1016/j.fsigen.2017.01.011.
  3. E.A. Grimes, P.J. Noake, L. Dixon, A. Urquhart, Sequence polymorphism in the human melanocortin 1 receptor gene as an indicator of the red hair phenotype, Forensic Sci. Int. 122 (2001) 124–129. https://doi.org/10.1016/S0379-0738(01)00480-7.
  4. V. Sharma, K. Jani, P. Khosla, E. Butler, D. Siegel, E. Wurmbach, Evaluation of ForenSeqTM Signature Prep Kit B on predicting eye and hair coloration as well as biogeographical ancestry by using Universal Analysis Software (UAS) and available web‐tools, Electrophoresis. 40 (2019) 1353–1364. https://doi.org/10.1002/elps.201800344.
image