A comparative analysis of parallel computing approaches for genome assembly

Interdiscip Sci. 2011 Mar;3(1):57-63. doi: 10.1007/s12539-011-0062-0. Epub 2011 Mar 3.

Abstract

Over the last two decades, we have witnessed a tremendous growth of sequenced genomic data. However, the algorithms and computational power required to expeditiously process, classify, and analyze genomic data has lagged considerably. In bioinformatics, one of the most challenging and computationally intensive processes, which may take up to weeks of compute time, is the assembly of large size genomes. Several computationally feasible sequential assemblers have been devised and implemented to assist in the process. A few algorithms also have been parallelized to speed up the assembly process. However, very little has been done to thoroughly analyze such parallel algorithms using the specific metrics of parallel computing paradigm. It is essential to investigate parallel assembly algorithms to ascertain their scalability and efficiency. The genomic data varies considerably in size that ranges from a few thousand units of data to several billions. Moreover, the degree of repetition in the data also exhibits high variance from one set to another. Therefore, we must establish an association between the nature, size, and degree of repetition in the genomic data and the best parallel assembly algorithm. The paper includes a comparative analysis of some of the most widely used approaches to assemble genomes using the parallel computing paradigm.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Electronic Data Processing
  • Genome
  • Genomics / methods*
  • Sequence Analysis, DNA