Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Nat Genet. 2005 Sep;37(9):991-6. doi: 10.1038/ng1630. Epub 2005 Aug 28.

Abstract

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology*
  • DNA, Complementary / chemistry*
  • Databases as Topic*
  • Exons / genetics*
  • Gene Expression Profiling
  • Genome*
  • Humans
  • Mice
  • Microarray Analysis
  • RNA, Messenger / chemistry
  • RNA, Messenger / metabolism
  • Transcription, Genetic*

Substances

  • DNA, Complementary
  • RNA, Messenger