FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

PeerJ. 2014 Jun 5:2:e425. doi: 10.7717/peerj.425. eCollection 2014.

Abstract

One of the major goals in metagenomics is to identify the organisms present in a microbial community from unannotated shotgun sequencing reads. Taxonomic profiling has valuable applications in biological and medical research, including disease diagnostics. Most currently available approaches do not scale well with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here we introduce FOCUS, an agile composition based approach using non-negative least squares (NNLS) to report the organisms present in metagenomic samples and profile their abundances. FOCUS was tested with simulated and real metagenomes, and the results show that our approach accurately predicts the organisms present in microbial communities. FOCUS was implemented in Python. The source code and web-sever are freely available at http://edwards.sdsu.edu/FOCUS.

Keywords: Metagenomes; Modeling; k-mer.

Grants and funding

GGZS and DAC were supported by NSF Grants (DEB-1046413 and CNS-1305112 to RAE). BED was supported by NWO Veni (016.111.075), CAPES/BRASIL and the Dutch Virgo Consortium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.