Today we will cover another example of memoization, quite different in nature. It finds the "diff" between two strings, meaning what was changed to go from one to another.
The number of changes is often called the edit distance or Levenshtein distance. Here we are just going to find the diff. The result will be a visual string that makes it easy to see where the changes were.
# Run this first, it's from Week 4, day 1 lecture
# See https://python-course.eu/advanced-python/memoization-decorators.php
# for an explanation of how this works and
# https://realpython.com/primer-on-python-decorators/ for a description
# of python decorators in general.
def memoize(func):
memory = {}
def helper(*args):
key = str(args)
if key in memory:
return memory[key]
ans = func(*args)
memory[key] = ans
return ans
helper.memory = memory
return helper
Base cases: First, the base case as in all recursions can be when s
or t
are empty strings. That's easy. Also, if they agree on the first characters, s[0]
== t[0]
, then we can just recurse on the rest: in that case edit_distance(s, t)
is equal to edit_distance(s[1:], t[1:])
because s[1:]
is everything after the first character.
Recursion: Otherwise, there are three possibilities, which we solve by recursion. We then take the smallest distance. Either the first character of s
is removed, the first character was changed, or the first character of t
was inserted on s
.
@memoize
def edit_distance(s, t):
'''Compute the minimal edit distance between two strings s and t, where an
an insertion counts as distance 1, a deletion counts as distance 1,
and a replacement counts as distance 1.
Example edit_distance('hello', 'hola') = 3 because:
1) you delete one of the two 'l' chars at a cost of 1
2) you replace the 'e' with an 'o' at a cost of 1
3) you replace the 'o' with an 'a' at a cost of 1
'''
if len(s) == 0:
return len(t)
if len(t) == 0:
return len(s)
if s[0] == t[0]:
return edit_distance(s[1:], t[1:])
a = 1 + edit_distance(s[1:], t) # remove the first character of s
b = 1 + edit_distance(s, t[1:]) # insert the first character of t
c = 1 + edit_distance(s[1:], t[1:]) # substitute the first character
return min(a, b, c)
edit_distance('fun', 'farming')
edit_distance.memory
Since this was my last lecture, I include further resources here: