Parallel or convergent evolution in human population genomic data revealed by genotype networks

BMC Evol Biol. 2016 Aug 2:16:154. doi: 10.1186/s12862-016-0722-0.

Abstract

Background: Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide.

Results: We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level.

Conclusion: For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.

Keywords: Genetic variation; Genotype networks; Human genome; Natural selection.

MeSH terms

  • Evolution, Molecular*
  • Genetic Variation*
  • Genome, Human
  • Genotype*
  • Haplotypes
  • Humans
  • Metagenomics
  • Phenotype
  • Phylogeny
  • Selection, Genetic