The edit distance is a way of quantifying how similar two strings are to one
another by counting the minimum number of character insertions, deletions, and
substitutions required to transform one string into the other. A simple dynamic
programming computes the edit distance between two strings of length
n in
O(n2) time, and a more sophisticated algorithm runs in time
O(n+t2) when
the edit distance is
t [Landau, Myers and Schmidt, SICOMP 1998]. In pursuit
of obtaining faster running time, the last couple of decades have seen a flurry
of research on approximating edit distance, including polylogarithmic
approximation in near-linear time [Andoni, Krauthgamer and Onak, FOCS 2010],
and a constant-factor approximation in subquadratic time [Chakrabarty, Das,
Goldenberg, Kouck\'y and Saks, FOCS 2018].
We study sublinear-time algorithms for small edit distance, which was
investigated extensively because of its numerous applications. Our main result
is an algorithm for distinguishing whether the edit distance is at most
t or
at least
t2 (the quadratic gap problem) in time
O~(tn+t3). This time bound is sublinear roughly for all
t
in
[ω(1),o(n1/3)], which was not known before. The best previous
algorithms solve this problem in sublinear time only for
t=ω(n1/3)
[Andoni and Onak, STOC 2009].
Our algorithm is based on a new approach that adaptively switches between
uniform sampling and reading contiguous blocks of the input strings. In
contrast, all previous algorithms choose which coordinates to query
non-adaptively. Moreover, it can be extended to solve the
t vs
t2−ϵ gap problem in time
O~(t1−ϵn+t3).