Efficient algorithms for sequence analysis with entropic profiles

Simona Ester Rombo, Cinzia Pizzi, Laxmi Parida, Cinzia Pizzi, Simone Spangaro, Mattia Ornamenti

Risultato della ricerca: Article

3 Citazioni (Scopus)

Abstract

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
Lingua originaleEnglish
pagine (da-a)117-128
Numero di pagine12
RivistaIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume15
Stato di pubblicazionePublished - 2018

Fingerprint

Sequence Analysis
Efficient Algorithms
Suffix Array
Suffix Tree
Subword
Suffix
Compressibility
Predictability
Redistribution
Linear-time Algorithm
Information Theory
Data Structures
Information theory
Entropy
Data structures
DNA
Profile
benzoylprop-ethyl

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Applied Mathematics
  • Genetics

Cita questo

Efficient algorithms for sequence analysis with entropic profiles. / Rombo, Simona Ester; Pizzi, Cinzia; Parida, Laxmi; Pizzi, Cinzia; Spangaro, Simone; Ornamenti, Mattia.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 15, 2018, pag. 117-128.

Risultato della ricerca: Article

Rombo, Simona Ester ; Pizzi, Cinzia ; Parida, Laxmi ; Pizzi, Cinzia ; Spangaro, Simone ; Ornamenti, Mattia. / Efficient algorithms for sequence analysis with entropic profiles. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2018 ; Vol. 15. pagg. 117-128.
@article{7bc5b1c422ff4abfa19b3751b1f58f9d,
title = "Efficient algorithms for sequence analysis with entropic profiles",
abstract = "Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. {\circledC} 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.",
author = "Rombo, {Simona Ester} and Cinzia Pizzi and Laxmi Parida and Cinzia Pizzi and Simone Spangaro and Mattia Ornamenti",
year = "2018",
language = "English",
volume = "15",
pages = "117--128",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Efficient algorithms for sequence analysis with entropic profiles

AU - Rombo, Simona Ester

AU - Pizzi, Cinzia

AU - Parida, Laxmi

AU - Pizzi, Cinzia

AU - Spangaro, Simone

AU - Ornamenti, Mattia

PY - 2018

Y1 - 2018

N2 - Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

AB - Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

UR - http://hdl.handle.net/10447/274877

M3 - Article

VL - 15

SP - 117

EP - 128

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -