Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the combinatorics of Minimal Absent Words in relation with the degree of repetitiveness of a sequence. We ran our experiments on a public dataset of Ribosomal RNA Sequences from the complex 16S. Our approach showed a very high score in the accuracy of the classification, proving hence that our method is comparable with the standard tools available for the automatic classification of bacteria species.
|Numero di pagine||10|
|Rivista||AIMS MEDICAL SCIENCE|
|Stato di pubblicazione||Published - 2018|