时雨小径 May the Spirit be with you

Longest K-distance Common Substring

Problem Description

Given two strings with the same length $L$, their distance $D$ is defined by the number of mismatched characters between them, we name these two strings as K-distance Common Substring. Now given two strings $S_1$ , $S_2$, find out what is the longest common substring between them with a distance not greater than $K$.

Analysis

The only difference between this problem and the classical dynamic programming problem "Longest common substring" is, there could be $K$ more mismatches exist, however, if we want to use the same dp method, then it would be an $O(n^3)$ one, since an extra dimension, the length of $K$ is added. Therefore, we need to think from another angle, try to solve a sub problem as:

Given two strings $S_1$ and $S_2$, and the length $l$, tell whether there is a pair of substrings with length $l$ exist that their distance is not greater than $K$.

Define $f(i,j,l)$ as the distance between the substrings $S_1^\prime$ and $S_2^\prime$, which have the same length $l$, then we can have:

$\forall{i,j} \in ( 0,l )$ , $f(i,j,l) = f(i - 1 , j - 1 , l) + d_1 + d_2 $, while:

and,

This can be done in $O(n^2)$, the rest of our work is to determine the longest length $l$, it is easy to think of binary search to reduce the time complexity, thus it makes the final complexity an $O(n^2\log{n})$.