An efficient algorithm for DNA fragment assembly in MapReduce

Biochem Biophys Res Commun. 2012 Sep 28;426(3):395-8. doi: 10.1016/j.bbrc.2012.08.101. Epub 2012 Aug 29.

Abstract

Fragment assembly is one of the most important problems of sequence assembly. Algorithms for DNA fragment assembly using de Bruijn graph have been widely used. These algorithms require a large amount of memory and running time to build the de Bruijn graph. Another drawback of the conventional de Bruijn approach is the loss of information. To overcome these shortcomings, this paper proposes a parallel strategy to construct de Bruijin graph. Its main characteristic is to avoid the division of de Bruijin graph. A novel fragment assembly algorithm based on our parallel strategy is implemented in the MapReduce framework. The experimental results show that the parallel strategy can effectively improve the computational efficiency and remove the memory limitations of the assembly algorithm based on Euler superpath. This paper provides a useful attempt to the assembly of large-scale genome sequence using Cloud Computing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Graphics*
  • DNA / genetics
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA