# On the distribution function of the complexity of ﬁnite sequences

Investigations of complexity of sequences lead to important applications such as effective data compression, testing of randomness, discriminating between information sources and many others. In this paper we establish formulae describing the distribution functions of random variables representing the complexity of ﬁnite sequences introduced by Lempel and Ziv in 1976. It is known that this quantity can be used as an estimator of entropy. We show that the distribution functions depend afﬁnely on the probabilities of the so-called exact sequences.

The notion of complexity of a given sequence was ﬁrst introduced in papers by Kolmogorov [13] and Chaitin [7]. Kolmogorov proposed, as a measure for the complexity of that sequence with respect to the given algorithm, the use of the length of the shortest binary program which, when fed into a given algorithm, will cause it to produce a speciﬁed sequence. If the length of the program is large we can say that the complexity of the sequence is large. In 1976 Lempel and Ziv [14] proposed and explored another approach to the problem of the complexity of a speciﬁc sequence. They linked the complexity of a speciﬁc sequence to the gradual buildup of new patterns along the given sequence. The complexity measure suggested by them is related to the number of distinct phrases and the rate of their occurrence along the sequence. It reﬂects the behavior of a simple parsing algorithm whose task is to recognize newly encountered phrases during its scanning of a given sequence. In a series of papers, modiﬁcations of the Lempel–Ziv parsing algorithm were proposed in response to the needs of various applications. In general, in these algorithms a new phrase is established as the shortest substring which has not occurred previously, where the search for previous occurrences may be restricted or generalized in the modiﬁed algorithms in various ways, e.g.: by considering only a ﬁxed number of preceding symbols [20], by considering only complete previously established phrases (Lempel–Ziv Incremental Parsing Algorithm [21]), by allowing a number (not more than a ﬁxed threshold) of previous occurrences of the phrase ,
etc.