FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

Raffaele Giancarlo, Gianluca Roscigno, Umberto Ferraro Petrillo, Giuseppe Cattaneo

Risultato della ricerca: Article

16 Citazioni (Scopus)

Abstract

Summary: MapReduce Hadoop bioinformatics applications require the availability of specialpurpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers versatility and efficiency. That is, it can handle collections of reads, with or without quality scores, as well as long genomic sequences while the existing routines concentrate mainly on NGS sequence data. Moreover, in the domain where a comparison is possible, the routines proposed here are faster than the available ones. In conclusion, FASTdoop is a much needed addition to Hadoop-BAM. Availability and Implementation: The software and the datasets are available at http://www.di. unisa.it/FASTdoop/ .
Lingua originaleEnglish
pagine (da-a)1575-1577
Numero di pagine3
RivistaBioinformatics
Volume33
Stato di pubblicazionePublished - 2017

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Entra nei temi di ricerca di 'FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications'. Insieme formano una fingerprint unica.

  • Cita questo