The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes

Raffaele Giancarlo, Filippo Utro

Risultato della ricerca: Article

9 Citazioni (Scopus)

Abstract

Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. Supplementary information: Supplementary data are available at Bioinformatics online. Contact:.
Lingua originaleEnglish
pagine (da-a)835-842
Numero di pagine8
RivistaBioinformatics
Volume32
Stato di pubblicazionePublished - 2016

Fingerprint

Nucleosomes
Genome
DNA
Genes
Computational Biology
DNA sequences
Bioinformatics
Subsequence
DNA Sequence
Statistical Models
Closed-form
Chromatin
Computer science
Chemical properties
Physical properties
Quantification
Statistical Model
Computer Science
Research
Encoding

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cita questo

@article{a842849aa17e482cbbcc53e7ae1d9e4b,
title = "The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes",
abstract = "Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. Supplementary information: Supplementary data are available at Bioinformatics online. Contact:.",
keywords = "Biochemistry, Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications1707 Computer Vision and Pattern Recognition, Molecular Biology, Statistics and Probability",
author = "Raffaele Giancarlo and Filippo Utro",
year = "2016",
language = "English",
volume = "32",
pages = "835--842",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes

AU - Giancarlo, Raffaele

AU - Utro, Filippo

PY - 2016

Y1 - 2016

N2 - Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. Supplementary information: Supplementary data are available at Bioinformatics online. Contact:.

AB - Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. Supplementary information: Supplementary data are available at Bioinformatics online. Contact:.

KW - Biochemistry

KW - Computational Mathematics

KW - Computational Theory and Mathematics

KW - Computer Science Applications1707 Computer Vision and Pattern Recognition

KW - Molecular Biology

KW - Statistics and Probability

UR - http://hdl.handle.net/10447/201074

UR - http://bioinformatics.oxfordjournals.org/

M3 - Article

VL - 32

SP - 835

EP - 842

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

ER -