The Three Steps of Clustering in the Post-Genomic Era: A Synopsis

Giosue' Lo Bosco, Raffaele Giancarlo, Utro, Filippo Utro, Luca Pinello

Risultato della ricerca: Chapter

10 Citazioni (Scopus)

Abstract

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particularly difficult. This paper highlights new findings thatseem to address a few relevant problems in each of the three mentioned steps, both in regard to the intrinsic predictive power of methods and algorithms and their time performance. Inclusion of this latter aspect into the evaluation process is quite novel, since it is hardly considered in genomic data analysis.
Lingua originaleEnglish
Titolo della pubblicazione ospiteComputational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers
Pagine13-30
Numero di pagine18
Stato di pubblicazionePublished - 2011

Serie di pubblicazioni

NomeLecture Notes in Bioinformatics

Fingerprint

Cluster analysis
Genomics
Clustering
Cluster Analysis
Clustering algorithms
Computer science
Statistics
Distance Function
Clustering Algorithm
Dimensionality
Data analysis
Computer Science
Inclusion
Evaluation

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cita questo

Lo Bosco, G., Giancarlo, R., Utro, Utro, F., & Pinello, L. (2011). The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. In Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers (pagg. 13-30). (Lecture Notes in Bioinformatics).

The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. / Lo Bosco, Giosue'; Giancarlo, Raffaele; Utro; Utro, Filippo; Pinello, Luca.

Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers. 2011. pag. 13-30 (Lecture Notes in Bioinformatics).

Risultato della ricerca: Chapter

Lo Bosco, G, Giancarlo, R, Utro, Utro, F & Pinello, L 2011, The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. in Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers. Lecture Notes in Bioinformatics, pagg. 13-30.
Lo Bosco G, Giancarlo R, Utro, Utro F, Pinello L. The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. In Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers. 2011. pag. 13-30. (Lecture Notes in Bioinformatics).
Lo Bosco, Giosue' ; Giancarlo, Raffaele ; Utro ; Utro, Filippo ; Pinello, Luca. / The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers. 2011. pagg. 13-30 (Lecture Notes in Bioinformatics).
@inbook{59acd82a7ec947c18ebe1c222241ccb1,
title = "The Three Steps of Clustering in the Post-Genomic Era: A Synopsis",
abstract = "Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particularly difficult. This paper highlights new findings thatseem to address a few relevant problems in each of the three mentioned steps, both in regard to the intrinsic predictive power of methods and algorithms and their time performance. Inclusion of this latter aspect into the evaluation process is quite novel, since it is hardly considered in genomic data analysis.",
keywords = "Clustering, cluster validation indices, distance functions",
author = "{Lo Bosco}, Giosue' and Raffaele Giancarlo and Utro and Filippo Utro and Luca Pinello",
year = "2011",
language = "English",
isbn = "978-3-642-21945-0",
series = "Lecture Notes in Bioinformatics",
pages = "13--30",
booktitle = "Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers",

}

TY - CHAP

T1 - The Three Steps of Clustering in the Post-Genomic Era: A Synopsis

AU - Lo Bosco, Giosue'

AU - Giancarlo, Raffaele

AU - Utro, null

AU - Utro, Filippo

AU - Pinello, Luca

PY - 2011

Y1 - 2011

N2 - Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particularly difficult. This paper highlights new findings thatseem to address a few relevant problems in each of the three mentioned steps, both in regard to the intrinsic predictive power of methods and algorithms and their time performance. Inclusion of this latter aspect into the evaluation process is quite novel, since it is hardly considered in genomic data analysis.

AB - Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particularly difficult. This paper highlights new findings thatseem to address a few relevant problems in each of the three mentioned steps, both in regard to the intrinsic predictive power of methods and algorithms and their time performance. Inclusion of this latter aspect into the evaluation process is quite novel, since it is hardly considered in genomic data analysis.

KW - Clustering

KW - cluster validation indices

KW - distance functions

UR - http://hdl.handle.net/10447/60526

M3 - Chapter

SN - 978-3-642-21945-0

T3 - Lecture Notes in Bioinformatics

SP - 13

EP - 30

BT - Computational Intelligence Methods for Bioinformatics and Biostatistics, 7th International Meeting, CIBB 2010, Palermo, Italy, September 2010 Revised Selected Papers

ER -