We study the Longest Common Extension (LCE) problem in a string containing
wildcards. Wildcards (also called "don't cares" or "holes") are special
characters that match any other character in the alphabet, similar to the
character "?" in Unix commands or "." in regular expression engines.
We consider the problem parametrized by
G, the number of maximal contiguous
groups of wildcards in the input string. Our main contribution is a simple data
structure for this problem that can be built in
O(n(G/t)logn) time,
occupies
O(nG/t) space, and answers queries in
O(t) time, for any $t \in [1
.. G]
.UptotheO(\log n)$ factor, this interpolates smoothly between the
data structure of Crochemore et al. [JDA 2015], which has
O(nG) preprocessing
time and space, and
O(1) query time, and a simple solution based on the
``kangaroo jumping'' technique [Landau and Vishkin, STOC 1986], which has
O(n) preprocessing time and space, and
O(G) query time.
By establishing a connection between this problem and Boolean matrix
multiplication, we show that our solution is optimal up to subpolynomial
factors when
G=Ω(n) under a widely believed hypothesis. In addition,
we develop a new simple, deterministic and combinatorial algorithm for sparse
Boolean matrix multiplication.
Finally, we show that our data structure can be used to obtain efficient
algorithms for approximate pattern matching and structural analysis of strings
with wildcards.