The problem of detecting and measuring the repetitiveness of one-dimensional
strings has been extensively studied in data compression and text indexing. Our
understanding of these issues has been significantly improved by the
introduction of the notion of string attractor [Kempa and Prezza, STOC 2018]
and by the results showing the relationship between attractors and other
measures of compressibility.
When the input data are structured in a non-linear way, as in two-dimensional
strings, inherent redundancy often offers an even richer source for
compression. However, systematic studies on repetitiveness measures for
two-dimensional strings are still scarce. In this paper we extend to two or
more dimensions the main measures of complexity introduced for one-dimensional
strings. We distinguish between the measures
δ and
γ, defined in
terms of the substrings of the input, and the measures
g,
grl, and
b,
which are based on copy-paste mechanisms. We study the properties and mutual
relationships between these two classes and we show that the two classes become
incomparable for
d-dimensional inputs as soon as
d≥2. Moreover, we show
that our grammar-based representation of a
d-dimensional string of size
N
enables direct access to any symbol in
O(logN) time.
We also compare our measures for two-dimensional strings with the 2D Block
Tree data structure [Brisaboa et al., Computer J., 2024] and provide some
insights for the design of future effective two-dimensional compressors.