On the distribution function of the complexity of finite sequences

Investigations of complexity of sequences lead to important applications such as effective data compression, testing of randomness, discriminating between information sources and many others. In this paper we establish formulae describing the distribution functions of random variables representing the complexity of finite sequences introduced by Lempel and Ziv in 1976. It is known that this quantity can be used as an estimator of entropy. We show that the distribution functions depend affinely on the probabilities of the so-called exact sequences.

The notion of complexity of a given sequence was first introduced in papers by Kolmogorov [13] and Chaitin [7]. Kolmogorov proposed, as a measure for the complexity of that sequence with respect to the given algorithm, the use of the length of the shortest binary program which, when fed into a given algorithm, will cause it to produce a specified sequence. If the length of the program is large we can say that the complexity of the sequence is large. In 1976 Lempel and Ziv [14] proposed and explored another approach to the problem of the complexity of a specific sequence. They linked the complexity of a specific sequence to the gradual buildup of new patterns along the given sequence. The complexity measure suggested by them is related to the number of distinct phrases and the rate of their occurrence along the sequence. It reflects the behavior of a simple parsing algorithm whose task is to recognize newly encountered phrases during its scanning of a given sequence. In a series of papers, modifications of the Lempel–Ziv parsing algorithm were proposed in response to the needs of various applications. In general, in these algorithms a new phrase is established as the shortest substring which has not occurred previously, where the search for previous occurrences may be restricted or generalized in the modified algorithms in various ways, e.g.: by considering only a fixed number of preceding symbols [20], by considering only complete previously established phrases (Lempel–Ziv Incremental Parsing Algorithm [21]), by allowing a number (not more than a fixed threshold) of previous occurrences of the phrase ,

Free download research paper