Languages with mismatches

Marinella Sciortino, Antonio Restivo, Chiara Epifanio, Filippo Mignosi, Filippo Mignosi, Alessandra Gabriele

Risultato della ricerca: Article

6 Citazioni (Scopus)

Abstract

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.
Lingua originaleEnglish
pagine (da-a)152-166
Numero di pagine15
RivistaDefault journal
Volume385
Stato di pubblicazionePublished - 2007

Fingerprint

Data structures
Data Structures
Strings
Approximate String Matching
Indexing
Language
Query
Integer
Text
Repetition
Class

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cita questo

Languages with mismatches. / Sciortino, Marinella; Restivo, Antonio; Epifanio, Chiara; Mignosi, Filippo; Mignosi, Filippo; Gabriele, Alessandra.

In: Default journal, Vol. 385, 2007, pag. 152-166.

Risultato della ricerca: Article

Sciortino, M, Restivo, A, Epifanio, C, Mignosi, F, Mignosi, F & Gabriele, A 2007, 'Languages with mismatches', Default journal, vol. 385, pagg. 152-166.
Sciortino, Marinella ; Restivo, Antonio ; Epifanio, Chiara ; Mignosi, Filippo ; Mignosi, Filippo ; Gabriele, Alessandra. / Languages with mismatches. In: Default journal. 2007 ; Vol. 385. pagg. 152-166.
@article{4a2a627b57e546d79b14912ea3939c93,
title = "Languages with mismatches",
abstract = "In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.",
author = "Marinella Sciortino and Antonio Restivo and Chiara Epifanio and Filippo Mignosi and Filippo Mignosi and Alessandra Gabriele",
year = "2007",
language = "English",
volume = "385",
pages = "152--166",
journal = "Default journal",

}

TY - JOUR

T1 - Languages with mismatches

AU - Sciortino, Marinella

AU - Restivo, Antonio

AU - Epifanio, Chiara

AU - Mignosi, Filippo

AU - Mignosi, Filippo

AU - Gabriele, Alessandra

PY - 2007

Y1 - 2007

N2 - In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.

AB - In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.

UR - http://hdl.handle.net/10447/29642

M3 - Article

VL - 385

SP - 152

EP - 166

JO - Default journal

JF - Default journal

ER -