TY - JOUR
T1 - Textual data compression in computational biology: Algorithmic techniques.
AU - Giancarlo, Raffaele
AU - Utro, null
PY - 2012
Y1 - 2012
N2 - In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from that one, the focus of this companion review is on the computational methods involved in the use of data compression in computational biology. Indeed, although one would expect ad hoc adaptation of compression techniques to work on biological data, unifying and homogeneous algorithmic approaches are emerging. Moreover, given that experiments based on parallel sequencing are the future for biological research, data compression techniques are among a handful of candidates that seem able, successfully, to deal with the deluge of sequence data they produce; although, until now, only in terms of storage and indexing, with the analysis still being a challenge. Therefore, the two reviews, complementing each other, are perceived to be a useful starting point for computer scientists to get acquainted with many of the computational challenges coming from computational biology in which core ideas of the information sciences are already having a substantial impact.
AB - In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from that one, the focus of this companion review is on the computational methods involved in the use of data compression in computational biology. Indeed, although one would expect ad hoc adaptation of compression techniques to work on biological data, unifying and homogeneous algorithmic approaches are emerging. Moreover, given that experiments based on parallel sequencing are the future for biological research, data compression techniques are among a handful of candidates that seem able, successfully, to deal with the deluge of sequence data they produce; although, until now, only in terms of storage and indexing, with the analysis still being a challenge. Therefore, the two reviews, complementing each other, are perceived to be a useful starting point for computer scientists to get acquainted with many of the computational challenges coming from computational biology in which core ideas of the information sciences are already having a substantial impact.
KW - Alignment-free sequence comparison
KW - Data Compression Theory and Practice
KW - Entropy
KW - Hidden Markov Models
KW - Huffman coding
KW - Kolmogorov complexity
KW - Lempel–Ziv compressors
KW - Minimum Description Length principle
KW - Pattern discovery in bioinformatics
KW - Reverse engineering of biological networks
KW - Sequence alignment
KW - Alignment-free sequence comparison
KW - Data Compression Theory and Practice
KW - Entropy
KW - Hidden Markov Models
KW - Huffman coding
KW - Kolmogorov complexity
KW - Lempel–Ziv compressors
KW - Minimum Description Length principle
KW - Pattern discovery in bioinformatics
KW - Reverse engineering of biological networks
KW - Sequence alignment
UR - http://hdl.handle.net/10447/69844
M3 - Article
VL - 6
SP - 1
EP - 25
JO - Computer Science Review
JF - Computer Science Review
SN - 1574-0137
ER -