Mastering Dynamic Programming: 4 Essential Algorithms for Optimal Solutions
Published:
Dive into the world of dynamic programming with this insightful post. Explore four essential algorithms that leverage dynamic programming to solve complex problems optimally. Enhance your understanding of data structure and algorithms with this comprehensive guide.
This post provides an introduction to dynamic programming and demonstrates its implementation in Python. Dynamic programming is a powerful algorithmic technique used to solve optimization problems by breaking them down into smaller overlapping subproblems and efficiently reusing their solutions. You can run this post in Google Colab using this link:
1. Longest Common Subsequence (LCS)
ongest Common Subsequence (LCS) is a well-known computational problem in computer science, it refers to finding the longest subsequence that is present in two or more strings (or sequences) in the same order. The goal of the LCS problem is to determine the longest sequence that is a common subsequence of two or more input sequences. It is a measure of the similarity between two sequences and is widely used in various applications, such as version control, data comparison, and biological sequence analysis.
- Inputs:
str1 = "ZXVVYZW"
str2 = "XKYKZPW"
Outputs: ["X", "Y", "Z", "W"]
Only return longest length
def lcs(str1, str2):
m = len(str1)
n = len(str2)
dp = [[-1 for i in range(n + 1)] for j in range(m + 1)]
return lcs_helper(str1, str2, m, n, dp)
def lcs_helper(str1, str2, m, n, dp):
if m == 0 or n == 0:
return 0
if dp[m][n] != -1:
return dp[m][n]
if str1[m - 1] == str2[n - 1]:
dp[m][n] = 1 + lcs_helper(str1, str2, m - 1, n - 1, dp)
return dp[m][n]
dp[m][n] = max(lcs_helper(str1, str2, m, n - 1, dp), lcs_helper(str1, str2, m - 1, n, dp))
return dp[m][n]
Explanation:
The above code implements a dynamic programming solution to find the Longest Common Subsequence (LCS) of two input strings str1 and str2. The algorithm uses a two-dimensional array dp to store intermediate results and improve the efficiency of the solution by avoiding redundant calculations.
The lcs
function is the main function that invokes the lcs_helper
function to find the LCS. The lcs function first calculates the lengths of the input strings and initializes the dp array with -1 values. The lcs_helper
function is called with the input strings, their lengths, and the dp array as arguments.
The lcs_helper
function uses a recursive approach to solve the LCS problem. It has base cases for when either of the input strings is empty, in which case it returns 0
. If the value for the current subproblem has already been calculated and stored in the dp
array, it returns the stored value instead of recalculating it. If the last characters of the two input strings match, the function adds 1 to the LCS length and recursively calls itself with the remaining substrings. If the last characters do not match, the function returns the maximum LCS length of the two substrings without the last characters. The intermediate results are stored in the dp array and can be reused in subsequent calls to improve the overall efficiency of the algorithm.
The time complexity of the above code is O(nm)
, where n
and m
are the lengths of the input strings str1
and str2
, respectively. This is because, in the worst case, the algorithm makes nm recursive calls, each of which takes constant time to calculate and store the intermediate results in the dp array. Hence, the total time complexity is O(nm)
.
The space complexity of the code is also O(nm)
, where n
and m
are the lengths of the input strings str1
and str2
, respectively. This is because the algorithm uses a two-dimensional array dp to store the intermediate results, and the size of the dp array is (m+1) * (n+1)
. Hence, the space complexity is O(nm).
Test:
import unittest
class TestLcsLength(unittest.TestCase):
def test_lcs_length(self):
self.assertEqual(lcs("ABCD", "AEBD"), 3)
self.assertEqual(lcs("AGGTAB", "GXTXAYB"), 4)
self.assertEqual(lcs("ABC", "DEF"), 0)
self.assertEqual(lcs("ABCDEF", "ABCDEF"), 6)
res = unittest.main(argv=[''], verbosity=3, exit=False)
assert len(res.result.failures) == 0
test_lcs_length (__main__.TestLcsLength) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.002s
OK
1.1. Return longest sequence values instead of length only
def lcs_sequence_return(str1, str2):
# Initialize LCS matrix
# Columns: number of characters in str1 + 1 (1 because of an empty string)
# Rows: number of characters in str2 + 1 (1 because of an empty string)
lcs = [[[] for x in range(len(str1) + 1)] for y in range(len(str2) + 1)]
# Loop through the rows
for i in range(1, len(str2) + 1):
# Loop through the columns
for j in range(1, len(str1) + 1):
# Compare the last characters
if str2[i - 1] == str1[j - 1]:
# If they're equal, append the character to the LCS of the diagonal elements
lcs[i][j] = lcs[i - 1][j - 1] + [str2[i - 1]]
else:
# If they're not equal, take the LCS of the elements on the left and right
lcs[i][j] = max(lcs[i - 1][j], lcs[i][j - 1], key=len)
return lcs[-1][-1]
class TestLcs(unittest.TestCase):
def test_lcs(self):
self.assertEqual(lcs_sequence_return("ABCD", "AEBD"), ['A', 'B', 'D'])
self.assertEqual(lcs_sequence_return("AGGTAB", "GXTXAYB"), ['G', 'T', 'A', 'B'])
self.assertEqual(lcs_sequence_return("ABC", "DEF"), [])
self.assertEqual(lcs_sequence_return("ABCDEF", "ABCDEF"), ['A', 'B', 'C', 'D', 'E', 'F'])
res = unittest.main(argv=[''], verbosity=3, exit=False)
assert len(res.result.failures) == 0
test_lcs (__main__.TestLcs) ... ok
test_lcs_length (__main__.TestLcsLength) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.005s
OK
The time complexity of this code is O(nm * min(n, m))
, where n
is the length of string str1
and m
is the length of string str2
. The reason is that the nested for loop runs through each possible combination of characters in str1 and str2
, and in each iteration, it concatenates two lists of characters, the size of which is equal to the length of the longest common subsequence found so far.
The space complexity of this code is O(nm * min(n, m))
, where n
is the length of string str1
and m
is the length of string str2
. This is because we are using a 2D array lcs
to store the result of each iteration of the nested for loop, and the size of the array is equal to the product of the lengths of str1
and str2
, multiplied by the length of the longest common subsequence found so far.
We can do better:
def lcs_sequence_return(str1, str2):
lcs = [[[None, 0, None, None] for x in range(len(str1) + 1)] for y in range(len(str2) + 1)]
for i in range(1, len(str2) + 1):
for j in range(1, len(str1) + 1):
if str2[i - 1] == str1[j - 1]:
lcs[i][j] = [str2[i - 1], lcs[i - 1][j - 1][1] + 1, i - 1, j - 1]
else:
if lcs[i - 1][j][1] > lcs[i][j - 1][1]:
lcs[i][j] = [None, lcs[i - 1][j][1], i - 1, j]
else:
lcs[i][j] = [None, lcs[i][j - 1][1], i, j - 1]
return build_sequence(lcs)
def build_sequence(lcs):
sequence = []
i = len(lcs) - 1
j = len(lcs[0]) - 1
while i != 0 and j != 0:
current_entry = lcs[i][j]
if current_entry[0]:
sequence.append(current_entry[0])
i = current_entry[2]
j = current_entry[3]
return list(reversed(sequence))
Explanation:
The above code is a solution to the problem of finding the Longest Common Subsequence (LCS) of two strings str1 and str2.
LCS is a sequence of characters that appear in the same order in both strings.
The code has two functions lcs_sequence_return
and build_sequence
:
lcs_sequence_return:
- It takes two strings str1 and str2 as input.
- It creates a 2D array lcs with dimensions (len(str2) + 1) x (len(str1) + 1).
- It initializes each element of lcs as a list with 4 values, [None, 0, None, None].
- The first value represents the character of the LCS, the second value represents the length of the LCS, the third value stores the row index of the previous entry, and the fourth value stores the column index of the previous entry.
- It iterates through each letter of str2 and str1 to fill lcs.
- If the current letter of str2 is the same as the current letter of str1, the function updates lcs with the current letter and the length of the LCS by adding 1 to the diagonal value.
- If the letters are different, the function updates the length of the LCS with the maximum value between the length of the LCS above and the length of the LCS to the left.
- The function returns the lcs array.
build_sequence:
- It takes the lcs array as input.
- It creates an empty list sequence to store the LCS.
- It initializes two variables i and j to the last row and column of lcs, respectively.
- It iterates through the lcs array by updating i and j with the previous row and column index stored in lcs.
- If the current element of lcs has a character (not None), the function appends the character to the sequence.
- The function returns the reversed sequence.
The time complexity of lcs_sequence_return
is O(nm)
. This is because the function uses a dynamic programming approach that requires 2 nested loops to fill in the lcs
array, and the size of the array is nm
.
The space complexity of lcs_sequence_return
is also O(nm)
, because it creates a 2-dimensional lcs array of size nm
to store the intermediate results.
The time and space complexity of the build_sequence
function is O(k)
, where k
is the length of the longest common subsequence, because it simply loops over the lcs
array and appends elements to the sequence list, which has a maximum length of k
.
2. Longest Increasing Subsequence (LIS)
The Longest Increasing Subsequence (LIS) problem is to find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order. For example, the length of LIS for {10, 22, 9, 33, 21, 50, 41, 60, 80} is 6 and LIS is {10, 22, 33, 50, 60, 80}:
arr[] | 10 | 22 | 9 | 33 | 21 | 50 | 41 | 60 | 80 |
---|---|---|---|---|---|---|---|---|---|
lis | 1 | 2 | 3 | 4 | 5 | 6 |
Code:
def lis(array):
n = len(array)
sequences = [None for _ in range(n)]
lengths = [1 for _ in range(n)]
max_length_idx = 0
for i in range(n):
current_num = array[i]
for j in range(i):
other_num = array[j]
if other_num < current_num and lengths[j] + 1 >= lengths[i]:
lengths[i] = lengths[j] + 1
sequences[i] = j
if lengths[i] >= lengths[max_length_idx]:
max_length_idx = i
return build_sequence_lis(array, sequences, max_length_idx)
def build_sequence_lis(array, sequences, current_idx):
sequence = []
while current_idx is not None:
sequence.append(array[current_idx])
current_idx = sequences[current_idx]
return list(reversed(sequence))
Explanation:
The lis
function takes an input array and calculates the length of the LIS for each element in the array. To do this, it uses two arrays sequences and lengths.
The sequences
array stores the index of the preceding element of the current element in the LIS. lengths
array stores the length of the LIS ending at the corresponding index. The max_length_idx
variable stores the index of the element with the maximum length of the LIS.
The function then iterates through each element of the input array and updates the lengths
and sequences
arrays. If an element has a smaller value than the current element and the LIS ending at that element is longer than the current LIS, then the length of the current LIS is updated and the sequences array is updated to store the index of the preceding element.
Finally, the build_sequence_lis
function is called with the input array
, the sequences
array, and the max_length_idx
to construct the actual LIS. This function uses the sequences
array to trace back from the element with the maximum LIS length to construct the LIS. The constructed LIS is then returned in reverse order.
The time complexity of this code is O(n^2)
and the space complexity is O(n)
. The algorithm uses nested for loops to iterate over the input array, so the time complexity is O(n^2)
. The space complexity is O(n)
because the algorithm uses two lists of size n, “sequences” and “lengths”, to store intermediate data during the computation.
Test:
class TestLis(unittest.TestCase):
def test_lis(self):
self.assertEqual(lis([5, 7, -24, 12, 10, 2, 3, 12, 5, 6, 35]), [-24, 2, 3, 5, 6, 35])
self.assertEqual(lis([1, 2, 3, 4, 5]), [1, 2, 3, 4, 5])
self.assertEqual(lis([5, 4, 3, 2, 1]), [1])
self.assertEqual(lis([1, 2, 4, 3, 5, 4, 7, 2]), [1, 2, 3, 4, 7])
res = unittest.main(argv=[''], verbosity=3, exit=False)
assert len(res.result.failures) == 0
test_lcs (__main__.TestLcs) ... ok
test_lcs_length (__main__.TestLcsLength) ... ok
test_lis (__main__.TestLis) ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.006s
OK
3. Number of way to traverse graph
We are given two integrer representing the width and heigh of a grid-shaped, rectangular graph. write a function that returns the number of ways to reach the bottom right corner of the graph when starting at the top left corner. You can go down and right and cannot go diagonally!
Code:
def number_of_ways_to_traverse_graph(width, height):
ways = [[0 for i in range(width + 1)] for j in range(height + 1)]
for w in range(1, width + 1):
for h in range(1, height + 1):
if h == 1 or w == 1:
ways[h][w] = 1
else:
ways[h][w] = ways[h - 1][w] + ways[h][w - 1]
return ways[height][width]
Explanation:
The number_of_ways_to_traverse_graph
function calculates the number of ways to traverse a graph with width
columns and height
rows. The calculation is based on the concept of dynamic programming.
The function starts by initializing a two-dimensional array ways with height + 1
rows and width + 1 columns
, with all elements set to 0.
The function then loops through the ways array and updates the values. The loop variables w and h represent the current column and row being processed.
If the current row h
is 1 or the current column w
is 1, the value in ways[h][w]
is set to 1. This is because there is only one way to traverse the graph if there is only one row or one column.
Otherwise, the value in ways[h][w]
is set to the sum of the values in the previous row h - 1
and the previous column w - 1
. This is because the number of ways to traverse the current position is equal to the sum of the number of ways to reach the previous row and the previous column.
Finally, the function returns the value in ways[height][width]
, which is the number of ways to traverse the graph with width columns and height rows.
The time complexity of is O(width * height)
. This is because the function loops through the width columns and height rows of the ways array, and the time to update each element in the array takes constant time.
The space complexity of the function is also O(width * height)
. This is because the function uses a two-dimensional array ways with height + 1
rows and width + 1
columns, and each element takes constant space.
Note that this is the best possible space complexity for this problem, as we need to store the intermediate results of the calculations in order to avoid redundant computations.
Test:
number_of_ways_to_traverse_graph(3, 2)
3
class TestNumberOfWaysToTraverseGraph (unittest.TestCase):
def test_number_of_ways_to_traverse_graph(self):
self.assertEqual(number_of_ways_to_traverse_graph(1, 1), 1)
self.assertEqual(number_of_ways_to_traverse_graph(2, 1), 1)
self.assertEqual(number_of_ways_to_traverse_graph(1, 2), 1)
self.assertEqual(number_of_ways_to_traverse_graph(2, 2), 2)
self.assertEqual(number_of_ways_to_traverse_graph(3, 2), 3)
res = unittest.main(argv=[''], verbosity=3, exit=False)
assert len(res.result.failures) == 0
test_lcs (__main__.TestLcs) ... ok
test_lcs_length (__main__.TestLcsLength) ... ok
test_lis (__main__.TestLis) ... ok
test_number_of_ways_to_traverse_graph (__main__.TestNumberOfWaysToTraverseGraph) ... ok
----------------------------------------------------------------------
Ran 4 tests in 0.009s
OK
4. Edit Distance or levenshtein distance
The Levenshtein distance (also known as Edit distance) is a measure of the difference between two strings. It calculates the minimum number of edits (insertions, deletions, or substitutions) required to transform one string into the other. The edit operations can be thought of as a series of transformations that take one string to the other.
The Levenshtein distance is widely used in many applications, such as spell checking, text correction, and DNA sequence alignment. The algorithm for calculating the Levenshtein distance is typically implemented using dynamic programming.
Code:
# O(n*m) time and space
def levenshtein_distance(str_1, str_2):
table = [[i for i in range(len(str_2) + 1)] for j in range(len(str_1) + 1)]
# Updating first column similar to second column
for idx_1 in range(1, len(str_1) + 1):
table[idx_1][0] = table[idx_1 - 1][0] + 1
for idx_1 in range(1, len(str_1) + 1):
for idx_2 in range(1, len(str_2) + 1):
if str_1[idx_1 - 1] == str_2[idx_2 - 1]: # because we add two empty string in table we need shifted lack in strings
table[idx_1][idx_2] = table[idx_1 - 1][idx_2 - 1]
else:
table[idx_1][idx_2] = 1 + min(table[idx_1 - 1][idx_2 - 1], table[idx_1][idx_2 - 1], table[idx_1 - 1][idx_2])
return table[-1][-1]
Explanation:
The code implements the dynamic programming approach to solve the problem. It creates a 2D table table of size (len(str_1) + 1) x (len(str_2) + 1)
and initializes the first column with 0, 1, 2, ..., len(str_2)
, representing the minimum number of edits to change an empty string into str2.
Then, it updates the first row in the same manner to represent the minimum number of edits to change str1 into an empty string. The rest of the cells in the table are filled in using the following logic:
If the characters at the current positions in str_1
and str_2
are the same, the corresponding cell in the table takes the value of the cell directly above and to the left.
If the characters are different, the corresponding cell takes the value of 1 + min(table[idx_1 - 1][idx_2 - 1], table[idx_1][idx_2 - 1], table[idx_1 - 1][idx_2])
, which represents the minimum number of edits (insertion, deletion, or substitution) to change the substrings ending at the current positions in str_1
and str_2
. Finally, the function returns the value in the last cell of the table, which is the minimum number of edits to change str_1
and str_2
.
Test:
class TestLevenshteinDistance(unittest.TestCase):
def test_levenshtein_distance(self):
self.assertEqual(levenshtein_distance("kitten", "sitting"), 3)
self.assertEqual(levenshtein_distance("flaw", "lawn"), 2)
self.assertEqual(levenshtein_distance("", "lawn"), 4)
self.assertEqual(levenshtein_distance("kitten", ""), 6)
self.assertEqual(levenshtein_distance("", ""), 0)
res = unittest.main(argv=[''], verbosity=3, exit=False)
assert len(res.result.failures) == 0
test_lcs (__main__.TestLcs) ... ok
test_lcs_length (__main__.TestLcsLength) ... ok
test_levenshtein_distance (__main__.TestLevenshteinDistance) ... ok
test_lis (__main__.TestLis) ... ok
test_number_of_ways_to_traverse_graph (__main__.TestNumberOfWaysToTraverseGraph) ... ok
----------------------------------------------------------------------
Ran 5 tests in 0.012s
OK