Some Investigations on Similarity Measures Based on Absent Words

Risultato della ricerca: Article

Abstract

In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common structure, but they are not able to detect the difference on the number of occurrences of such a structure in the two sequences. In order to take into account such a multiplicity, we introduce the notion of multifactor, and define a new measure that uses both absent words and multifactors. Surprisingly, we prove that this similarity measure coincides with a distance on sequences introduced by Ehrenfeucht and Haussler in [2], in the context of block-moves strategies. In this way, our result creates a non trivial bridge between similarity measures based on absent words and those based on the block-moves approach.
Lingua originaleEnglish
pagine (da-a)97-112
Numero di pagine16
RivistaFundamenta Informaticae
Volume171
Stato di pubblicazionePublished - 2019

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Algebra and Number Theory
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint Entra nei temi di ricerca di 'Some Investigations on Similarity Measures Based on Absent Words'. Insieme formano una fingerprint unica.

Cita questo