Distance Functions, Clustering Algorithms and Microarray Data Analysis

Risultato della ricerca: Other

26 Citazioni (Scopus)

Abstract

Distance functions are a fundamental ingredient of classificationand clustering procedures, and this holds true also in theparticular case of microarray data. In the general data mining andclassification literature, functions such as Euclidean distance orPearson correlation have gained their status of de facto standards thanks to a considerable amount of experimentalvalidation. For microarray data, the issue of which distancefunction works best has been investigated, but no finalconclusion has been reached. The aim of this extended abstract is toshed further light on that issue. Indeed, we present an experimentalstudy, involving several distances, assessing (a) their intrinsicseparation ability and (b) their predictive power when used inconjunction with clustering algorithms. The experiments have beencarried out on six benchmark microarray datasets, where thegold solution is known for each of them. We have used bothHierarchical and K-means clustering algorithms and externalvalidation criteria as evaluation tools. From the methodologicalpoint of view, the main result of this study is a ranking of thosemeasures in terms of their intrinsic and clustering abilities,highlighting also the correlations between the two. Pragmatically,based on the outcomes of the experiments, one receives theindication that Minkowski, cosine and Pearson correlation distancesseems to be the best choice when dealing with microarray dataanalysis.
Lingua originaleEnglish
Pagine125-138
Numero di pagine14
Stato di pubblicazionePublished - 2010

Fingerprint

Microarray Data Analysis
Microarrays
Distance Function
Microarray Data
Clustering algorithms
Microarray
Clustering Algorithm
Clustering
Pearson Correlation
K-means Algorithm
K-means Clustering
Euclidean Distance
Experiment
Ranking
Data Mining
Benchmark
Evaluation
Data mining
Experiments
Standards

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cita questo

@conference{7ce5c2a6013e402eb2f6bb968ec464dc,
title = "Distance Functions, Clustering Algorithms and Microarray Data Analysis",
abstract = "Distance functions are a fundamental ingredient of classificationand clustering procedures, and this holds true also in theparticular case of microarray data. In the general data mining andclassification literature, functions such as Euclidean distance orPearson correlation have gained their status of de facto standards thanks to a considerable amount of experimentalvalidation. For microarray data, the issue of which distancefunction works best has been investigated, but no finalconclusion has been reached. The aim of this extended abstract is toshed further light on that issue. Indeed, we present an experimentalstudy, involving several distances, assessing (a) their intrinsicseparation ability and (b) their predictive power when used inconjunction with clustering algorithms. The experiments have beencarried out on six benchmark microarray datasets, where thegold solution is known for each of them. We have used bothHierarchical and K-means clustering algorithms and externalvalidation criteria as evaluation tools. From the methodologicalpoint of view, the main result of this study is a ranking of thosemeasures in terms of their intrinsic and clustering abilities,highlighting also the correlations between the two. Pragmatically,based on the outcomes of the experiments, one receives theindication that Minkowski, cosine and Pearson correlation distancesseems to be the best choice when dealing with microarray dataanalysis.",
keywords = "Clustering, distance measures",
author = "Raffaele Giancarlo and {Lo Bosco}, Giosue' and Luca Pinello",
year = "2010",
language = "English",
pages = "125--138",

}

TY - CONF

T1 - Distance Functions, Clustering Algorithms and Microarray Data Analysis

AU - Giancarlo, Raffaele

AU - Lo Bosco, Giosue'

AU - Pinello, Luca

PY - 2010

Y1 - 2010

N2 - Distance functions are a fundamental ingredient of classificationand clustering procedures, and this holds true also in theparticular case of microarray data. In the general data mining andclassification literature, functions such as Euclidean distance orPearson correlation have gained their status of de facto standards thanks to a considerable amount of experimentalvalidation. For microarray data, the issue of which distancefunction works best has been investigated, but no finalconclusion has been reached. The aim of this extended abstract is toshed further light on that issue. Indeed, we present an experimentalstudy, involving several distances, assessing (a) their intrinsicseparation ability and (b) their predictive power when used inconjunction with clustering algorithms. The experiments have beencarried out on six benchmark microarray datasets, where thegold solution is known for each of them. We have used bothHierarchical and K-means clustering algorithms and externalvalidation criteria as evaluation tools. From the methodologicalpoint of view, the main result of this study is a ranking of thosemeasures in terms of their intrinsic and clustering abilities,highlighting also the correlations between the two. Pragmatically,based on the outcomes of the experiments, one receives theindication that Minkowski, cosine and Pearson correlation distancesseems to be the best choice when dealing with microarray dataanalysis.

AB - Distance functions are a fundamental ingredient of classificationand clustering procedures, and this holds true also in theparticular case of microarray data. In the general data mining andclassification literature, functions such as Euclidean distance orPearson correlation have gained their status of de facto standards thanks to a considerable amount of experimentalvalidation. For microarray data, the issue of which distancefunction works best has been investigated, but no finalconclusion has been reached. The aim of this extended abstract is toshed further light on that issue. Indeed, we present an experimentalstudy, involving several distances, assessing (a) their intrinsicseparation ability and (b) their predictive power when used inconjunction with clustering algorithms. The experiments have beencarried out on six benchmark microarray datasets, where thegold solution is known for each of them. We have used bothHierarchical and K-means clustering algorithms and externalvalidation criteria as evaluation tools. From the methodologicalpoint of view, the main result of this study is a ranking of thosemeasures in terms of their intrinsic and clustering abilities,highlighting also the correlations between the two. Pragmatically,based on the outcomes of the experiments, one receives theindication that Minkowski, cosine and Pearson correlation distancesseems to be the best choice when dealing with microarray dataanalysis.

KW - Clustering

KW - distance measures

UR - http://hdl.handle.net/10447/58466

M3 - Other

SP - 125

EP - 138

ER -