Reverse-Safe Data Structures for Text Indexing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
Original languageEnglish
Title of host publication2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX)
Pages199-213
Number of pages15
Publication statusPublished - 2020

Fingerprint

Data structures
Pattern matching

Cite this

Fici, G. (2020). Reverse-Safe Data Structures for Text Indexing. In 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) (pp. 199-213)

Reverse-Safe Data Structures for Text Indexing. / Fici, Gabriele.

2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). 2020. p. 199-213.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fici, G 2020, Reverse-Safe Data Structures for Text Indexing. in 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). pp. 199-213.
Fici G. Reverse-Safe Data Structures for Text Indexing. In 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). 2020. p. 199-213
Fici, Gabriele. / Reverse-Safe Data Structures for Text Indexing. 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). 2020. pp. 199-213
@inproceedings{0ec75453d59c4aa0b50ee9edae3f6e40,
title = "Reverse-Safe Data Structures for Text Indexing",
abstract = "We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.",
author = "Gabriele Fici",
year = "2020",
language = "English",
pages = "199--213",
booktitle = "2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX)",

}

TY - GEN

T1 - Reverse-Safe Data Structures for Text Indexing

AU - Fici, Gabriele

PY - 2020

Y1 - 2020

N2 - We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

AB - We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

UR - http://hdl.handle.net/10447/391699

M3 - Conference contribution

SP - 199

EP - 213

BT - 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX)

ER -