online read us now
Paper details
Number 4 - December 1995
Volume 5 - 1995
Universal data compression based on approximate string matching
Ilan Sadeh
Abstract
Two practical source coding schemes based on approximate string matching are proposed. One is an approximate fixed-length string matching data compression combined with a block-coder based on the empirical distribution, and the other is an LZ-type quasi-parsing method by approximate string matching. A lemma on approximate string matching, which is an extension of the Kac Lemma, is proved. It is shown, based on the lemma, that the former deterministic algorithm converts the stationary and ergodic source, u, into an output process v. On the assumption that v is a stationary process, after the scheme has run for the infinite time, it is shown that the optimal compression ratio R(D) is achieved. This reduces the problem of the universal lossy coder to the proof of stationarity of the output process v in the proposed algorithm. A similar result holds for the latter algorithm in the limit of the long database size, which is the suffix of the infinite database generated by the former algorithm. The duality between the two algorithms is proved. The performance of the two algorithms and their suboptimal versions are evaluated. The main advantages of the proposed methods are the asymptotic sequential behaviour of the encoder and the simplicity of the decoder.
Keywords
-