Reference-free discovery of nuclear SNPs permits accurate, sensitive identification of Carya (hickory) species and hybrids

Document Type


Date of Original Version



Premise: DNA-based species identification is critical when morphological identification is restricted, but DNA-based identification pipelines typically rely on the ability to compare homologous sequence data across species. Because many clades lack robust genomic resources, we present here a bioinformatics pipeline capable of generating genome-wide single-nucleotide polymorphism (SNP) data while circumventing the need for any reference genome or annotation data. Methods: Using the SISRS bioinformatics pipeline, we generated de novo ortholog data for the genus Carya, isolating sites where genetic variation was restricted to a single Carya species (i.e., species-informative SNPs). We leveraged these SNPs to identify both full-species and hybrid Carya specimens, even at very low sequencing depths. Results: We identified between 46,000 and 476,000 species-identifying SNPs for each of eight diploid Carya species, and all species identifications were concordant with the species of record. For all putative F1 hybrid specimens, both parental species were correctly identified in all cases, and more punctate patterns of introgression were detectable in more cryptic crosses. Discussion: Bioinformatics pipelines that use only short-read sequencing data provide vital new tools enabling rapid expansion of DNA identification assays for model and non-model clades alike.

Publication Title

Applications in Plant Sciences