Maximum entropy weighting of aligned sequences of proteins or DNA

Proc Int Conf Intell Syst Mol Biol. 1995:3:215-21.

Abstract

In a family of proteins or other biological sequences like DNA the various subfamilies are often very unevenly represented. For this reason a scheme for assigning weights to each sequence can greatly improve performance at tasks such as database searching with profiles or other consensus models based on multiple alignments. A new weighting scheme for this type of database search is proposed. In a statistical description of the searching problem it is derived from the maximum entropy principle. It can be proved that, in a certain sense, it corrects for uneven representation. It is shown that finding the maximum entropy weights is an easy optimization problem for which standard techniques are applicable.

Publication types

  • Comparative Study

MeSH terms

  • Amino Acid Sequence*
  • Animals
  • Base Sequence*
  • Consensus Sequence
  • Conserved Sequence
  • DNA / chemistry*
  • Databases, Factual*
  • Humans
  • Markov Chains
  • Mathematics
  • Molecular Sequence Data
  • Protein Conformation
  • Proteins / chemistry*
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid
  • Thermodynamics

Substances

  • Proteins
  • DNA