On parsing optimality for dictionary-based text compression— the Zip case

Alessio Langiu, Alessio Langiu

Risultato della ricerca: Article

10 Citazioni (Scopus)

Abstract

Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results of both a theoretical and practical nature, which have appeared in the last three decades. We follow the historical steps of the Zip scheme showing how the original optimal parsing problem of finding a parse formed by the minimum number of phrases has been replaced by the bit-optimal parsing problem where the goal is to minimise the length in bits of the encoded text.
Lingua originaleEnglish
pagine (da-a)65-70
Numero di pagine6
RivistaJournal of Discrete Algorithms
Volume20
Stato di pubblicazionePublished - 2013

Fingerprint

Text Compression
Parsing
Glossaries
Optimality
Compression
Data compression
Best Practice
Data Compression
Minimise
Software
Dictionary

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computational Theory and Mathematics
  • Discrete Mathematics and Combinatorics

Cita questo

On parsing optimality for dictionary-based text compression— the Zip case. / Langiu, Alessio; Langiu, Alessio.

In: Journal of Discrete Algorithms, Vol. 20, 2013, pag. 65-70.

Risultato della ricerca: Article

@article{7e097a5bc15b4eb58e9aaeac9a202946,
title = "On parsing optimality for dictionary-based text compression— the Zip case",
abstract = "Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results of both a theoretical and practical nature, which have appeared in the last three decades. We follow the historical steps of the Zip scheme showing how the original optimal parsing problem of finding a parse formed by the minimum number of phrases has been replaced by the bit-optimal parsing problem where the goal is to minimise the length in bits of the encoded text.",
author = "Alessio Langiu and Alessio Langiu",
year = "2013",
language = "English",
volume = "20",
pages = "65--70",
journal = "Journal of Discrete Algorithms",
issn = "1570-8667",
publisher = "Elsevier",

}

TY - JOUR

T1 - On parsing optimality for dictionary-based text compression— the Zip case

AU - Langiu, Alessio

AU - Langiu, Alessio

PY - 2013

Y1 - 2013

N2 - Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results of both a theoretical and practical nature, which have appeared in the last three decades. We follow the historical steps of the Zip scheme showing how the original optimal parsing problem of finding a parse formed by the minimum number of phrases has been replaced by the bit-optimal parsing problem where the goal is to minimise the length in bits of the encoded text.

AB - Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results of both a theoretical and practical nature, which have appeared in the last three decades. We follow the historical steps of the Zip scheme showing how the original optimal parsing problem of finding a parse formed by the minimum number of phrases has been replaced by the bit-optimal parsing problem where the goal is to minimise the length in bits of the encoded text.

UR - http://hdl.handle.net/10447/86143

M3 - Article

VL - 20

SP - 65

EP - 70

JO - Journal of Discrete Algorithms

JF - Journal of Discrete Algorithms

SN - 1570-8667

ER -