Date of Original Version
Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise in part due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here we examine how different regions of the genome support or contradict well-established hypotheses among three mammal groups using millions of orthologous parsimony-informative biallelic sites [PIBS] distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences, introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from coding sequences [CDS] in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible whole genome sequence data, these results (1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference, and (2) reinforce the importance of accurate modeling, especially when using CDS data.
Robert Literman, Rachel Schwartz, Genome-scale profiling reveals noncoding loci carry higher proportions of concordant data, Molecular Biology and Evolution, 2021;, msab026, https://doi.org/10.1093/molbev/msab026
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License