Unlock the Potential of Strings and Arrays: 14 Powerful Algorithms Revealed

60 minute read

Published:

Dive into the world of strings and arrays with this informative post. Unveil 14 powerful algorithms that will revolutionize your approach to problem-solving. Learn how to manipulate strings and arrays effectively, and elevate your data structure and algorithms skills.

In this post, we will learn about string and array in Python. Let’s dive in! You can run this post in Google Colab using this link:

Open In Colab

1. Reverse a string without affecting special characters

Given a string, reverse the order of its alphabetic characters without affecting its special characters.

Example:

Input: "ab$cd#ef"

Output: "fe$dc#ba"

The alphabetic characters 'a', 'b', 'e', and 'f' are reversed, while the special characters '$', '*', and '#' remain in their original positions.

def reverse_string(text):
    # Convert the string to a list of characters
    text_list = list(text)

    # Initialize two pointers, one at the beginning and one at the end of the string
    left = 0
    right = len(text_list) - 1

    # Swap the alphabetic characters from both ends until the pointers meet or cross each other
    while left < right:
        if not text_list[left].isalpha():
            left += 1
        elif not text_list[right].isalpha():
            right -= 1
        else:
            text_list[left], text_list[right] = text_list[right], text_list[left]
            left += 1
            right -= 1

    # Convert the list of characters back to a string and return it
    return ''.join(text_list)

Explanation:

The code uses a two-pointer approach to swap the alphabetic characters in the string. The input string is converted to a list of characters so that it can be modified. The left pointer starts at the beginning of the string, and the right pointer starts at the end of the string. The pointers move towards each other until they meet or cross each other.

At each iteration, the code checks if the character at the left pointer is alphabetic. If it is not, then the left pointer is incremented to skip over the special character. Similarly, if the character at the right pointer is not alphabetic, then the right pointer is decremented to skip over the special character.

If both the left and right pointers point to alphabetic characters, then the characters are swapped using a tuple assignment. After the swap, the left pointer is incremented and the right pointer is decremented.

Once the pointers have crossed each other, the reversed string is constructed from the modified list using the join() method.

Time and space complexity:

The time complexity of the code is O(n), where n is the length of the input string. This is because the code iterates over each character in the string at most once. The space complexity of the code is O(n), because the input string is converted to a list of characters, which requires O(n) extra space.

Test:

# Test case 1
text_1 = 'ab$cd#ef'
expected_1 = 'fe$dc#ba'

print('Original text: ', text_1)
print('Reversed text: ', reverse_string(text_1))
assert reverse_string(text_1) == expected_1

# Test case 2
text_2 = 'ab$cd#ef'
expected_2 = 'fe$dc#ba'

print('Original text: ', text_2)
print('Reversed text: ', reverse_string(text_2))
assert reverse_string(text_2) == expected_2

# Test case 3
text_3 = ''
expected_3 = ''

print('Original text: ', text_3)
print('Reversed text: ', reverse_string(text_3))
assert reverse_string(text_3) == expected_3

# Test case 4
text_4 = 'abcd'
expected_4 = 'dcba'

print('Original text: ', text_4)
print('Reversed text: ', reverse_string(text_4))
assert reverse_string(text_4) == expected_4

# Test case 5
text_5 = 'a!bcd?efgh$ijklmn#opqr&stuvwxy^z'
expected_5 = 'z!yxw?vuts$rqponm#lkji&hgfedcb^a'

print('Original text: ', text_5)
print('Reversed text: ', reverse_string(text_5))
assert reverse_string(text_5) == expected_5






Original text:  ab$cd#ef
Reversed text:  fe$dc#ba
Original text:  ab$cd#ef
Reversed text:  fe$dc#ba
Original text:
Reversed text:
Original text:  abcd
Reversed text:  dcba
Original text:  a!bcd?efgh$ijklmn#opqr&stuvwxy^z
Reversed text:  z!yxw?vuts$rqponm#lkji&hgfedcb^a

2. Given a string, print all possible palindromic partitions

The question asks to find all possible palindromic partitions of a given string. For example, if the input string is "aab", the function should output the following possible partitions: [["a", "a", "b"], ["aa", "b"]].

def all_pal_partitions(string):
    # To Store all palindromic partitions
    all_parts = []
    # To store current palindromic partition
    curr_part = []
    left = 0
    right = len(string)
    # Call recursive function to generate all partitions and store in all_parts
    all_pal_partitions_helper(all_parts, curr_part, left, right, string)
    return all_parts

# A utility function to check if a substring is a palindrome
def is_palindrome(string, left, right):
    while left < right:
        if string[left] != string[right]:
            return False
        left += 1
        right -= 1
    return True

# Recursive function to find all palindromic partitions of string[start..n-1]
# all_parts --> A list of lists of strings.
#               Every list inside it stores a partition
# curr_part --> A list of strings to store current partition
def all_pal_partitions_helper(all_parts, curr_part, left, right, string):
    # If 'left' has reached len (right)
    if left >= right:
        # In Python lists are passed by reference, that is why it is needed to copy first
        # and then append
        x = curr_part.copy()
        all_parts.append(x)
        return
    # Pick all possible ending points for substrings
    for i in range(left, right):
        # If substring string[left..i] is palindrome
        if is_palindrome(string, left, i):
            # Add the substring to result
            curr_part.append(string[left:i + 1])
            # Recur for remaining substring
            all_pal_partitions_helper(all_parts, curr_part, i + 1, right, string)
            # Remove substring string[left..i] from current partition and make curr_part empty
            curr_part.pop()

Explanation:

The all_pal_partitions function takes a string as input and initializes an empty list all_parts to store all palindromic partitions. It also initializes an empty list curr_part to store the current palindromic partition. It then sets the variables left and right to 0 and the length of the string, respectively. The all_pal_partitions_helper function is called with the parameters all_parts, curr_part, left, right, and string.

The is_palindrome function is a utility function that takes a string and two indices as input and returns True if the substring of the string between the indices is a palindrome, otherwise False.

The all_pal_partitions_helper function is the recursive function that generates all palindromic partitions. It takes all_parts, curr_part, left, right, and string as input. If left is greater than or equal to right, the curr_part list is copied and appended to all_parts, and the function returns. Otherwise, the function picks all possible ending points for substrings using a for loop. If the substring of string between left and i is a palindrome, the substring is added to the curr_part list, and the function is called recursively with the updated curr_part, i + 1, and right parameters. The curr_part list is then popped to remove the added substring and continue iterating the for loop.

The function then returns the all_parts list, which contains all possible palindromic partitions of the input string.

Time and space complexity:

The time complexity of the code is O(n * 2^n), where n is the length of the input string. This is because the code generates all possible partitions of the string, which is O(2^n), and for each partition, the code checks if it is a palindrome, which is O(n). The space complexity of the code is O(n), because the code uses a list of strings to store the current palindromic partition, which requires O(n) extra space.

Test:

# Test case 1: a single character string
print('String: a')
print('All palindromic partitions: ', all_pal_partitions('a'))
assert all_pal_partitions('a') == [['a']]

# Test case 2: a two character string
print('String: ab')
print('All palindromic partitions: ', all_pal_partitions('ab'))
assert all_pal_partitions('ab') == [['a', 'b']]

# Test case 3: a three character string with no palindromic substrings
print('String: abc')
print('All palindromic partitions: ', all_pal_partitions('abc'))
assert all_pal_partitions('abc') == [['a', 'b', 'c']]

# Test case 4: a three character string with one palindromic substring
print('String: aba')
print('All palindromic partitions: ', all_pal_partitions('aba'))
assert all_pal_partitions('aba') == [['a', 'b', 'a'], ['aba']]

# Test case 5: a four character string with two palindromic substrings
print('String: abba')
print('All palindromic partitions: ', all_pal_partitions('abba'))
assert all_pal_partitions('abba') == [['a', 'b', 'b', 'a'], ['a', 'bb', 'a'], ['abba']]

String: a
All palindromic partitions:  [['a']]
String: ab
All palindromic partitions:  [['a', 'b']]
String: abc
All palindromic partitions:  [['a', 'b', 'c']]
String: aba
All palindromic partitions:  [['a', 'b', 'a'], ['aba']]
String: abba
All palindromic partitions:  [['a', 'b', 'b', 'a'], ['a', 'bb', 'a'], ['abba']]

3. Count triplets with sum smaller than a given value

Write function to return the number of triplets in an input array whose sum is smaller than a given target sum. For example, if the input array is [3, 1, 0, -2] and the target sum is 2, the function should return 2 since there are two triplets that satisfy the condition: [-2, 0, 3] and [-2, 1, 3].

def count_triplets(array, target_sum):
    array.sort()
    result = []

    for i in range(len(array) - 2):
        left = i + 1
        right = len(array) - 1
        while left < right:
            current_sum = array[i] + array[left] + array[right]
            if current_sum == target_sum:
                result.append([array[i], array[left], array[right]])
                right -= 1
                left += 1
            elif target_sum < current_sum:
                right -= 1
            elif target_sum > current_sum:
                left += 1

    return result

Explanation:

The count_triplets function takes in two arguments: an array of integers array and a target sum target_sum. The goal of the function is to count the number of triplets of integers in the array whose sum is less than the given target_sum. The function returns a list of all the triplets that meet this condition.

First, the function sorts the input array in non-descending order using the sort method of the list data type in Python.

The function initializes an empty list result to store the triplets that meet the condition.

Next, the function uses a loop to iterate over each index i in the array up to the second-to-last element. This is because the function considers triplets and we need at least three elements to form a triplet.

Inside this loop, the function initializes two pointers left and right, where left points to the element immediately to the right of i and right points to the last element in the array.

The function then enters a while loop that continues until the left pointer is greater than or equal to the right pointer. This is because the pointers will eventually converge at the center of the array and we want to avoid counting the same triplet more than once.

At each iteration of the while loop, the function calculates the sum of the elements at the i, left, and right pointers and stores it in the current_sum variable.

If the current_sum is equal to the target_sum, then the function appends a triplet consisting of the elements at the i, left, and right pointers to the result list. The left pointer is then incremented by one and the right pointer is decremented by one to search for the next triplet that meets the condition.

If the current_sum is less than the target_sum, then the left pointer is incremented by one to search for a larger value that could be added to the current i and right elements to create a sum that is less than the target_sum.

If the current_sum is greater than the target_sum, then the right pointer is decremented by one to search for a smaller value that could be added to the current i and left elements to create a sum that is less than the target_sum.

Finally, the function returns the result list, which contains all the triplets that meet the condition.

Time and space complexity:

The time complexity of the code is O(n^2), where n is the length of the input array. This is because the code iterates over each element in the array and for each element, it iterates over the remaining elements in the array. The space complexity of the code is O(n), because the code uses a list to store the triplets that meet the condition, which requires O(n) extra space.

Test:

# Example usage of the function
array = [5, 1, 3, 4, 7]
target_sum = 12
triplets = count_triplets(array, target_sum)
print('array = [5, 1, 3, 4, 7], target_sum = 12')
print('triplets: ', triplets)
assert triplets == [[1, 4, 7], [3, 4, 5]]

# Test case 1: empty array
array = []
target_sum = 10
triplets = count_triplets(array, target_sum)
print('array = [], target_sum = 10')
print('triplets: ', triplets)
assert triplets == []

# Test case 2: target sum is larger than all possible triplets
array = [1, 2, 3, 4, 5]
target_sum = 20
triplets = count_triplets(array, target_sum)
print('array = [1, 2, 3, 4, 5], target_sum = 20')
print('triplets: ', triplets)
assert triplets == []


array = [5, 1, 3, 4, 7], target_sum = 12
triplets:  [[1, 4, 7], [3, 4, 5]]
array = [], target_sum = 10
triplets:  []
array = [1, 2, 3, 4, 5], target_sum = 20
triplets:  []

4. Convert array into Zig-Zag fashion

The problem requires us to convert a given array into a zig-zag fashion, such that the elements in the array appear in a “peak-valley” pattern. Specifically, for any index i, the elements at index i-1, i and i+1 must either be in increasing or decreasing order. For example, given the input array [4, 3, 7, 8, 6, 2, 1], the function should return the array [3, 7, 4, 8, 2, 6, 1].

def zig_zag(array):
    # use sort function to sort the array
    array.sort()
    # traverse the array from 1 to n-1
    for i in range(1, len(array)-1, 2):
        # swap value of current element with next element
        array[i], array[i + 1] = array[i + 1], array[i]
    return array

Explanation:

The function zig_zag takes an array arr as input and sorts it in ascending order using the sort() method. Then it traverses the array from the second element to the second-last element with a step of 2. During each iteration, the function swaps the current element with the next element to create the zig-zag pattern.

Swapping adjacent elements in an array helps us to bring the smaller element to the left and larger element to the right. By swapping the adjacent elements of an array in a loop, we can achieve the Zig-Zag fashion.

For example, if we have an array ` [4, 3, 7, 8, 6, 2, 1] and we sort it in ascending order, it becomes [1, 2, 3, 4, 6, 7, 8].. Now, we start from the second element and swap it with the third element, i.e., 2 and 3. The array becomes [1, 3, 2, 4, 6, 7, 8]. We then move to the fourth element and swap it with the fifth element, i.e., 4 and 6. The array becomes [1, 3, 2, 6, 4, 7, 8]. We then move to the sixth element and swap it with the seventh element, i.e., 7 and 8. The final array becomes [1, 3, 2, 6, 4, 8, 7]`, which is the zigzag fashion of the input array. Finally, the function returns the modified array.

Time and space complexity:

The time complexity of the function is O(n log(n)), where n is the size of the input array. This is because the sorting operation takes O(n log(n)) time, which dominates the time complexity of the function.

The space complexity of the function is O(1), which means that it uses a constant amount of additional memory to execute, regardless of the size of the input array.

Test:

def test_zig_zag(array, expected_result):

    print("Initial array:", array)
    result = zig_zag(array)
    print("Zig-zag array:", result)

    assert result == expected_result

# Test Example 1
arr1 = [4, 3, 7, 8, 6, 2, 1]
expected_result1 = [1, 3, 2, 6, 4, 8, 7]
test_zig_zag(arr1, expected_result1)

# Test Example 2
arr2 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
expected_result2 = [1, 3, 2, 5, 4, 7, 6, 9, 8]
test_zig_zag(arr2, expected_result2)

# Test Example 3
arr3 = [1, 2]
expected_result3 = [1, 2]
test_zig_zag(arr3, expected_result3)


Initial array: [4, 3, 7, 8, 6, 2, 1]
Zig-zag array: [1, 3, 2, 6, 4, 8, 7]
Initial array: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Zig-zag array: [1, 3, 2, 5, 4, 7, 6, 9, 8]
Initial array: [1, 2]
Zig-zag array: [1, 2]

5. Zigzag traversal in two-dimensional array

The problem requires us to traverse a two-dimensional array in a zig-zag fashion. Specifically, we need to traverse the array in a “peak-valley” pattern, where the elements at index i, j and i+1, j-1 must either be in increasing or decreasing order. For example, given the input array [[1, 2, 3], [4, 5, 6], [7, 8, 9]], the function should return the array [1, 2, 4, 7, 5, 3, 6, 8, 9].

Image

def zig_zag_traverse(array):
    height = len(array) - 1
    width = len(array[0]) - 1
    result = []
    row, col = 0, 0
    going_down = True
    while not is_out_of_bounds(row, col, height, width):
        result.append(array[row][col])
        if going_down:
            if col == 0 or row == height:
                going_down = False
                if row == height:
                    # we are in last row and can go right
                    col += 1
                else:
                    # we are in first column and can go down
                    row += 1
            # ZigZag down
            else:
                row += 1
                col -= 1
        else:
            if row == 0 or col == width:
                going_down = True
                if col == width:
                    # not in last column so we can go down
                    row += 1
                else:
                    # we are in first row and can go right
                    col += 1
            # ZigZag up
            else:
                row -= 1
                col += 1
    return result


def is_out_of_bounds(row, col, height, width):
    return row < 0 or row > height or col < 0 or col > width

Explanation:

The purpose of this function is to traverse a two-dimensional array in a zig-zag fashion, and return the values in the order in which they were visited. To do this, the function takes a 2D array as input and returns a 1D array containing the values visited during the traversal.

First, the function initializes some variables, including the height and width of the input array, an empty list to store the result, and the starting row and column (which are both set to 0).

The function then enters a while loop that continues until the end of the array is reached. The loop uses a boolean variable going_down to keep track of whether the traversal is currently moving downwards or upwards.

Within the loop, the function appends the value at the current row and column to the result list.

The function then checks whether the traversal is currently moving downwards or upwards. If it’s moving downwards, it checks whether it has reached either the last column or the last row of the array. If it has, it switches the going_down variable to False, and adjusts the row and column values accordingly so that the traversal can move upwards. If it hasn’t reached the end of the array, it moves downwards by incrementing the row and decrementing the column.

If the traversal is currently moving upwards, the function checks whether it has reached either the first row or the last column of the array. If it has, it switches the going_down variable to True, and adjusts the row and column values accordingly so that the traversal can move downwards. If it hasn’t reached the end of the array, it moves upwards by decrementing the row and incrementing the column.

Finally, the function returns the result list containing the values visited during the zig-zag traversal of the array.

Time and space complexity:

Overall, the function has a time complexity of O(n), where n is the total number of elements in the two-dimensional array. This is because the function traverses each element of the array exactly once. The space complexity of the function is also O(n), since it stores all of the visited values in a list.

Test:

def test_zigzag_traverse(matrix, expected_result):
    print("Input matrix:")
    for row in matrix:
        print(row)

    result = zig_zag_traverse(matrix)
    print("Zigzag traversal result:", result)

    assert result == expected_result


matrix = [  [1, 3, 4, 10],
            [2, 5, 9, 11],
            [6, 8, 12, 15],
            [7, 13, 14, 16]
          ]

expected_result = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

test_zigzag_traverse(matrix, expected_result)


Input matrix:
[1, 3, 4, 10]
[2, 5, 9, 11]
[6, 8, 12, 15]
[7, 13, 14, 16]
Zigzag traversal result: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

6. Pythagorean Triplet in an array

Given an array of integers, write a function that returns true if there is a triplet (a, b, c) that satisfies $a^2 + b^2 = c^2$.

# O(n^2) time | O(1) space
def is_triplet(array):
    # Square all the elements
    for i in range(len(array)):
        array[i] = array[i] * array[i]

    # sort array elements
    array.sort(reverse=True)

    # fix one element
    # and find other two
    # i goes from 0 to len(arr) - 1
    for i in range(len(array) - 1):
        # start two index variables from
        # two corners of the array and
        # move them toward each other
        left = i + 1
        right = len(array) - 1
        while left < right:
            # A triplet found
            if array[left] + array[right] == array[i]:
                return True
            else:
                if array[left] + array[right] > array[i]:
                    left += 1
                else:
                    right -= 1
    # If we reach here, then no triplet found
    return False

Explanation:

The function is_triplet takes an array of integers as input and returns a Boolean value indicating whether there is a Pythagorean triplet in the array or not. The function first squares all the elements of the array by iterating over it. Then, it sorts the array in descending order. This is because we want to start with the largest number in the array and find the other two numbers that will satisfy the Pythagorean theorem.

The function then fixes one element, and tries to find two other elements such that their sum is equal to the square of the fixed element. It does this by using two index variables - one starting from the left and the other starting from the right of the array. The two index variables move towards each other, checking whether the sum of the squares of their corresponding elements is equal to the square of the fixed element or not. If the sum is equal, then the function returns True indicating that there is a Pythagorean triplet in the array. If the sum is greater than the square of the fixed element, then the left index variable is incremented, and if it is less, then the right index variable is decremented. This is because we have sorted the array in descending order and hence the left index variable will always correspond to a smaller number than the right index variable.

The main difference between this code and the triple sum code is that in the triple sum problem, we are looking for a triplet of numbers whose sum is equal to a target value. However, in this code, we are looking for a triplet of numbers that satisfy a specific mathematical property, i.e., the Pythagorean theorem.

Time and space complexity:

The time complexity of the function is O(n^2), where n is the size of the input array. This is because the function iterates over the array twice, and each iteration takes O(n) time. The space complexity of the function is O(1), which means that it uses a constant amount of additional memory to execute, regardless of the size of the input array.

Test:

# test case 1
array = [3, 1, 4, 6, 5]
print("array:", array)
result = is_triplet(array)
print("is_triplet:", result)
assert result == True

# test case 2
array = [10, 4, 6, 12, 5]
print("array:", array)
result = is_triplet(array)
print("is_triplet:", result)
assert result == False

array: [3, 1, 4, 6, 5]
is_triplet: True
array: [10, 4, 6, 12, 5]
is_triplet: False

7. Check if all array elements are distinct

Given an array of integers, write a function that checks if all elements in the array are distinct and returns True if they are, and False otherwise.

def is_distinct(arr):
    seen = set()
    for elem in arr:
        if elem in seen:
            return False
        seen.add(elem)
    return True

Explanation:

To check if all array elements are distinct, we can iterate through the array and keep track of elements we have already seen using a set. For each element in the array, we check if it is already in the set. If it is, we know that the element is not distinct and we return False. If we make it through the entire array without finding any repeated elements, we can return True.

Note: The main property of a set is that it is an unordered collection of unique elements. This means that each element in a set is distinct, and the order in which elements are added to a set is not preserved.

Time and space complexity:

The time complexity of the function is O(n), where n is the size of the input array. This is because the function iterates over the array once, and each iteration takes O(1) time. The space complexity of the function is also O(n), since it stores all of the elements of the array in a set.

Test:

arr = [1, 2, 3, 4, 5]
print('Arrya: ', arr)
print('all_distinct: ', is_distinct(arr))

assert is_distinct(arr) == True


arr = [1, 2, 3, 4, 5, 5]
print('Arrya: ', arr)
print('all_distinct: ', is_distinct(arr))

assert is_distinct(arr) == False
Arrya:  [1, 2, 3, 4, 5]
all_distinct:  True
Arrya:  [1, 2, 3, 4, 5, 5]
all_distinct:  False

8. Length of the largest subarray with contiguous elements

Given an array of distinct integers, find length of the longest subarray which contains numbers that can be arranged in a continuous sequence. In other words, it is asking to find a subarray with consecutive integers such that the length of this subarray is the largest possible among all such subarrays in the given array.

Let’s consider the array array = [2, 0, 2, 1, 4, 3, 1, 0]. The longest subarray with contiguous elements in this array is [0, 2, 1, 4, 3], as all the elements in this subarray are distinct and can be arranged in a continuous sequence. The difference between maximum and minimum element in this subarray is 4-0=4, which is equal to the difference between the last and first index of the subarray (index 1 to 5), which is 5-1=4. Therefore, this subarray satisfies the condition for having contiguous elements. The length of this subarray is 5, which is the answer we should return.

def find_length(array):
    n = len(array)
    # Initialize result
    max_len = 1
    for i in range(n - 1):

        # Initialize min and max for
        # all subarrays starting with i
        mn = array[i]
        mx = array[i]

        # Consider all subarrays starting
        # with i and ending with j
        for j in range(i + 1, n):

            # Update min and max in
            # this subarray if needed
            mn = min(mn, array[j])
            mx = max(mx, array[j])

            # If current subarray has
            # all contiguous elements
            if ((mx - mn) == j - i):
                max_len = max(max_len, mx - mn + 1)

    return max_len

Explanation:

The function works by iterating over each element in the input array, and then considering all subarrays starting with that element and ending with every subsequent element in the array. For each subarray, the function determines whether it contains contiguous elements by checking whether the difference between the maximum and minimum elements in the subarray is equal to the difference between the last and first indices of the subarray. If a subarray is found that contains contiguous elements and is longer than the current longest contiguous subarray, the function updates its result to be the length of the new longest subarray. Finally, the function returns the length of the longest contiguous subarray it found.

Time and space complexity:

The time complexity of the find_length function is O(n) where n is the size of the input array. This is because the function iterates over the array once, and each iteration takes O(1) time. The space complexity of the function is also O(1), since it stores only a constant number of variables regardless of the size of the input array.

Test:

# test case 1
array = [10, 12, 11]
print("array:", array)
result = find_length(array)
print("find_length:", result)
assert result == 3

# test case 2
array = [1, 56, 58, 57, 90, 92, 94, 93, 91, 45]
print("array:", array)
result = find_length(array)
print("find_length:", result)
assert result == 5



array: [10, 12, 11]
find_length: 3
array: [1, 56, 58, 57, 90, 92, 94, 93, 91, 45]
find_length: 5

9. Two number sum

Given an array of integers, write a function that returns a pair of numbers such that they sum up to a specific target. You may assume that there will be only one pair of numbers that sum up to the target.

def two_number_sum_sorting(array, target_sum):
    array.sort() # This is the only line we added. Everything else is the same.
    left = 0
    right = len(array) - 1

    while left < right:
        sum_candidate = array[left] + array[right]

        if sum_candidate < target_sum:
            left += 1
        elif sum_candidate > target_sum:
            right -= 1
        elif sum_candidate == target_sum:
            return [array[left], array[right]]

    return []


def two_number_sum_hashing(array, target_sum):
    nums = {}
    for num in array:
        potential_match = target_sum - num
        if potential_match in nums:
            return [potential_match, num]
        else:
            nums[num] = True
    return []

Explanation:

The first function, two_number_sum_sorting, works by first sorting the input array, and then using two pointers, one at the start of the array and one at the end. It then checks the sum of the values at these pointers, and if it is less than the target sum, it moves the left pointer to the right. If it is greater than the target sum, it moves the right pointer to the left. If it is equal to the target sum, it returns the two numbers.

The time complexity of this function is O(nlogn), where n is the length of the input array. The space complexity is O(1), since it only uses a constant amount of extra space.

The second function, two_number_sum_hashing, works by using a hash table to store the numbers in the array. It iterates through each number in the array, and for each number, it calculates the difference between the target sum and that number. It then checks if that difference is already in the hash table. If it is, it returns the two numbers. If not, it adds the current number to the hash table.

The time complexity of this function is O(n), where n is the length of the input array. The space complexity is also O(n), since in the worst case, all n elements of the array could be stored in the hash table.

In terms of coding, the two functions use different approaches to solve the same problem. The sorting approach is more intuitive and easier to understand, but it requires sorting the input array, which can be time-consuming for large arrays. The hashing approach is more efficient, but requires using a hash table, which can be more complex for beginners to understand.

test:

def test_two_sum(array, target_sum, expected_result):
    print("array:", array)
    print("target_sum:", target_sum)
    result = two_number_sum_sorting(array, target_sum)
    print("two_number_sum_sorting:", result)
    assert result == expected_result

    result = two_number_sum_hashing(array, target_sum)
    print("two_number_sum_hashing:", result)
    assert result == expected_result

# Test Case 1
# Both functions should return [-1, 11] for this input.
array = [3, 5, -4, 8, 11, 1, -1, 6]
target_sum = 10
excepted_result = [-1, 11]
test_two_sum(array, target_sum, excepted_result)

# Test Case 2
# Both functions should return [4, 6] for this input.
array = [4, 6]
target_sum = 10
excepted_result = [4, 6]
test_two_sum(array, target_sum, excepted_result)

# Test Case 3
# Both functions should return [] for this input.
array = [4, 6, 2]
target_sum = 5
excepted_result = []
test_two_sum(array, target_sum, excepted_result)

# Test Case 4
# Both functions should return [-3, 4] for this input.
array = [4, 6, 1, -3]
target_sum = 1
excepted_result = [-3, 4]
test_two_sum(array, target_sum, excepted_result)

array: [3, 5, -4, 8, 11, 1, -1, 6]
target_sum: 10
two_number_sum_sorting: [-1, 11]
two_number_sum_hashing: [-1, 11]
array: [4, 6]
target_sum: 10
two_number_sum_sorting: [4, 6]
two_number_sum_hashing: [4, 6]
array: [4, 6, 2]
target_sum: 5
two_number_sum_sorting: []
two_number_sum_hashing: []
array: [4, 6, 1, -3]
target_sum: 1
two_number_sum_sorting: [-3, 4]
two_number_sum_hashing: [-3, 4]

What if we want to return index of two numbers instead of the numbers themselves?

def two_number_sum_hashing_idx(array, targetSum):
    nums = {}
    for i, num in enumerate(array):
        potential_match = targetSum - num
        if potential_match in nums:
            return [nums[potential_match], i]
        else:
            nums[num] = i
    return []

Let’s test it:

def test_two_sum_idx(array, target_sum, expected_result):
    print("array:", array)
    print("target_sum:", target_sum)
    result = two_number_sum_hashing_idx(array, target_sum)
    print("two_number_sum_hashing:", two_number_sum_hashing(array, target_sum))
    print("two_number_sum_hashing_idx:", result)
    assert result == expected_result

# Test Case 1
# Both functions should return [-1, 11] for this input.
array = [3, 5, -4, 8, 11, 1, -1, 6]
target_sum = 10
excepted_result = [4, 6]
test_two_sum_idx(array, target_sum, excepted_result)


# Test Case 2
# Both functions should return [1, 3] for this input.
array = [1, 2, 3, 4, 5, 6, 7]
target_sum = 6
expected_result = [1, 3]
test_two_sum_idx(array, target_sum, expected_result)



array: [3, 5, -4, 8, 11, 1, -1, 6]
target_sum: 10
two_number_sum_hashing: [11, -1]
two_number_sum_hashing_idx: [4, 6]
array: [1, 2, 3, 4, 5, 6, 7]
target_sum: 6
two_number_sum_hashing: [2, 4]
two_number_sum_hashing_idx: [1, 3]

10. Smallest subarray with sum greater than a given value

Given an array of integers, write a function that returns the length of the smallest subarray and subarray itself with a sum greater than a given value. If there is no such subarray, return 0.

def smallest_subarray_with_sum(array, target_sum):
    # target_sum is target
    n = len(array)
    # Initialize current sum and minimum length
    curr_sum = 0

    # set as maximum possible length
    min_len = n + 1

    # Initialize starting and ending indexes
    start = 0
    end = 0
    start_res, end_res = 0, 0
    while (end < n):

        # Keep adding array elements while current
        # sum is smaller than or equal to target_sum
        while curr_sum <= target_sum and end < n:
            curr_sum += array[end]
            end += 1

        # If current sum becomes greater than target_sum
        while curr_sum > target_sum and start < n:

            # Update minimum length if needed
            if (end - start < min_len):
                min_len = end - start
                end_res = end
                start_res = start

            # remove starting elements
            curr_sum -= array[start]
            start += 1

    return min_len, array[start_res: end_res]

Explanation:

The function takes an array array and a target sum target_sum as input, and returns a tuple containing the minimum length of a subarray with a sum greater than or equal to target_sum, and the subarray itself.

n is the length of the input array. curr_sum is the current sum of the subarray being considered. min_len is initialized to be greater than the length of the input array, so that it can be updated with the length of the shortest subarray with a sum greater than or equal to target_sum. start and end are pointers to the beginning and end of the subarray being considered. start_res and end_res are used to store the starting and ending indexes of the subarray with the smallest length.

The function then enters a while loop, which runs until the end pointer has reached the end of the input array. Inside the loop, there are two nested while loops:

  • The inner while loop increments the end pointer and adds the corresponding element of the input array to the curr_sum until the curr_sum is greater than or equal to target_sum. Once this condition is met, the outer while loop moves on to the next step

  • The second inner while loop decrements the start pointer and subtracts the corresponding element of the input array from the curr_sum until the curr_sum is less than or equal to target_sum. At each step, the function checks if the length of the subarray is smaller than the current minimum length (min_len). If it is, the min_len, start_res, and end_res variables are updated to reflect the new shortest subarray. Once the curr_sum is less than or equal to target_sum, the inner while loop exits and the outer while loop increments the end pointer and adds the corresponding element of the input array to the curr_sum, starting the process over again.

Once the while loop has completed, the function returns a tuple containing the minimum length of a subarray with a sum greater than or equal to target_sum, and the subarray itself.

Overall, the function works by maintaining two pointers (start and end) to a sliding window of the input array, and incrementing or decrementing them based on whether the current sum of the window is less than or greater than the target sum target_sum. The function keeps track of the shortest subarray with a sum greater than or equal to target_sum, and returns it at the end.

Time and space complexity:

The time complexity of the smallest_subarray_with_sum function is O(n), where n is the length of the input array. This is because the function iterates over the input array once, and each iteration takes O(1) time. The space complexity of the function is also O(1), since it stores only a constant number of variables regardless of the size of the input array.

Test:

def test_smallest_subarray(array, target_sum, expected_result):
    print("array:", array)
    print("target_sum:", target_sum)
    result = smallest_subarray_with_sum(array, target_sum)
    print("result:", result)
    assert result == expected_result


# Test Case 1
array = [1, 4, 45, 6, 0, 19]
target_sum = 51
expected_result = (3, [4, 45, 6])
test_smallest_subarray(array, target_sum, expected_result)


# Test Case 2
array = [1, 10, 5, 2, 7]
target_sum = 9
expected_result = (1, [10])
test_smallest_subarray(array, target_sum, expected_result)

# Test Case 3
array = [1, 11, 100, 1, 0, 200, 3, 2, 1, 250]
target_sum = 280
expected_result = (4, [100, 1, 0, 200])
test_smallest_subarray(array, target_sum, expected_result)

array: [1, 4, 45, 6, 0, 19]
target_sum: 51
result: (3, [4, 45, 6])
array: [1, 10, 5, 2, 7]
target_sum: 9
result: (1, [10])
array: [1, 11, 100, 1, 0, 200, 3, 2, 1, 250]
target_sum: 280
result: (4, [100, 1, 0, 200])

11. Stock Buy Sell to Maximize Profit

Write a function to find the maximum profit that can be made by buying and selling stocks on different days given an array of daily stock prices.

The function should take an array of stock prices as input and return the maximum profit that can be made, along with the indices of the buy and sell days. The buy day should come before the sell day. If no profit can be made, the function should return 0.

def stock_buy_sell(prices):

    n = len(prices)
    buys = []
    sells = []
    profits = 0

    # Prices must be given for at least two days
    if n == 1:
        return

    # Traverse through given price array
    i = 0

    while i < n - 1:

        # Find local minima
        # Note that the limit is n-2 as we are
        # comparing present element to the next element
        while i < n - 1 and prices[i + 1] <= prices[i]:
            i += 1

        # If we reached the end, break
        # as no further solution possible
        if i == n - 1:
            break

        # Store the index of minima
        buy = i
        i += 1

        # Find local maxima
        # Note that the limit is n-1 as we are
        # comparing to previous element
        while i < n and prices[i] >= prices[i - 1]:
            i += 1

        # Store the index of maxima
        sell = i - 1

        sells.append(sell)
        buys.append(buy)

        profits += prices[sell] - prices[buy]

    return profits, buys, sells

Explanation:

The stock_buy_sell function takes in an array of prices and finds the maximum profit that can be earned by buying and selling stocks. The function first initializes empty lists for buys, sells, and profits. Then, it checks that the length of the price array is greater than one. If it is not, the function simply returns.

The function then iterates through the price array and finds local minima and maxima. It starts by finding the first local minima and then finding the corresponding local maxima. It stores the indices of the local minima and maxima in the buys and sells lists respectively. It also calculates the profit earned by subtracting the price at the local minima from the price at the local maxima and adds it to the profits variable. The function continues to find local minima and maxima until it reaches the end of the price array.

Time and space complexity:

The time complexity of this function is O(n), where n is the length of the input array. This is because the function iterates through the array only once. The space complexity of this function is O(n), as the function initializes two lists of length n to store the buys and sells indices. Additionally, the function initializes a variable profits, which is a constant amount of space. Therefore, the overall space complexity is linear in the length of the input array.

Test:

def test_stock_buy_sell(array, expected_profits, expected_buys, expected_sells):
    print("array:", array)
    profits, buys, sells = stock_buy_sell(array)
    print("profits:", profits)
    print("buys:", buys)
    print("sells:", sells)
    assert profits == expected_profits
    assert buys == expected_buys
    assert sells == expected_sells

# Test case 1
array = [100, 180, 260, 310, 40, 535, 695]
expected_profits = 865
expected_buys = [0, 4]
expected_sells = [3, 6]
test_stock_buy_sell(array, expected_profits, expected_buys, expected_sells)

# Test case 2
array = [90, 80, 70, 60, 50]
expected_profits = 0
expected_buys = []
expected_sells = []
test_stock_buy_sell(array, expected_profits, expected_buys, expected_sells)

# Test case 3
array = [100, 180]
expected_profits = 80
expected_buys = [0]
expected_sells = [1]
test_stock_buy_sell(array, expected_profits, expected_buys, expected_sells)

array: [100, 180, 260, 310, 40, 535, 695]
profits: 865
buys: [0, 4]
sells: [3, 6]
array: [90, 80, 70, 60, 50]
profits: 0
buys: []
sells: []
array: [100, 180]
profits: 80
buys: [0]
sells: [1]

12. Infix to Prefix conversion

Given an Infix expression, convert it into a Prefix expression using two stacks.

Infix: An expression is called the Infix expression if the operator appears in between the operands in the expression. Simply of the form (operand1 operator operand2).

Example : (A+B) * (C-D)

Prefix: An expression is called the prefix expression if the operator appears in the expression before the operands. Simply of the form (operator operand1 operand2).

Example : *+AB-CD (Infix : (A+B) * (C-D) )

def infix_to_prefix(infix):
  # initial empty stack for operands and operator
  operands = []
  operators = []

  for i in range(len(infix)):

    # If current character is an opening bracket, then
    # push into the operators stack.
    if infix[i] == '(':
      operators.append(infix[i])

    # edge case when there is white space in infix
    elif infix[i] == " ":
      continue

    # If current character is a closing bracket, then pop from
    # both stacks and push result in operands stack until
    # matching opening bracket is not found.
    elif infix[i] == ')':
      while len(operators) != 0 and operators[-1] != '(':
        operands, operators = operandsAppend(operands, operators)
      operators.pop()

    elif not is_operator(infix[i]):
      operands.append(infix[i])

    else:
      while len(operators) != 0 and get_priority(infix[i]) <= get_priority(operators[-1]):
        operands, operators = operandsAppend(operands, operators)
      operators.append(infix[i])


  while len(operators) != 0:
    operands, operators = operandsAppend(operands, operators)


  return operands[-1]


# function for adding operands and operator in form operator
# + operand1 + operand2.
def operandsAppend(operands, operators):
  operand1 = operands.pop()
  operand2 = operands.pop()

  operator = operators.pop()

  new_str = operator + operand2 + operand1
  operands.append(new_str)

  return operands, operators

# Function to check if given character is an operator or not.
def is_operator(char):
  return (not char.isalpha()) and (not char.isdigit())


# Function to get the priority of operators
def get_priority(c):
    if c == '-' or c == '+':
        return 1
    elif c == '*' or c == '/':
        return 2
    elif c == '^':
        return 3
    return 0



Explanation:

This code converts an infix expression to a prefix expression using two stacks. Here’s how it works:

  • Reverse the infix expression: The first step is to reverse the infix expression. This is because in prefix notation, the operator comes before the operands, whereas in infix notation, the operator comes between the operands. Reversing the infix expression allows us to apply the same logic as in postfix notation, where the operator comes after the operands. For example, the infix is '(A+B)*(C-D) and reverse is )D-C(*)B+A(.

  • Initialize empty stacks: The code initializes two empty stacks - one for operators and one for operands.

  • Define operator precedence: The code defines a dictionary that maps each operator to its precedence level. This is used to determine the order in which operators should be added to the prefix expression.

  • Traverse the infix expression: The code traverses the reversed infix expression from right to left, one character at a time.

  • If the current character is an operand: If the current character is an alphabet or a digit, it is pushed onto the operand stack.

  • If the current character is a closing parenthesis: If the current character is a closing parenthesis, it is pushed onto the operator stack.

  • If the current character is an opening parenthesis: If the current character is an opening parenthesis, operators are popped from the operator stack and added to the prefix expression until a closing parenthesis is encountered. The closing parenthesis is then popped and discarded.

  • If the current character is an operator: If the current character is an operator, operators are popped from the operator stack and added to the prefix expression until an operator with lower precedence or a closing parenthesis is encountered. The current operator is then pushed onto the operator stack.

  • Pop the remaining operators: Once the infix expression has been completely traversed, any remaining operators on the operator stack are popped and added to the prefix expression.

  • Reverse the prefix expression: Finally, the prefix expression is reversed to get the final result.

Note: When we use the pop() method on a list in Python, it removes and returns the last element in the list. In other words, it pops the element from the end of the list. If you want to remove and return an element from a specific position in the list, you can use the pop(index) method where index is the position of the element you want to remove.

Time and space complexity:

The time complexity of this algorithm is O(n), where n is the length of the infix expression. This is because each character in the infix expression is processed exactly once. The space complexity of this algorithm is also O(n), since the algorithm uses two stacks of size n each.

Test:

def test_infix_to_prefix(infix, expected_prefix):
  print("infix:", infix)
  prefix = infix_to_prefix(infix)
  print("prefix:", prefix)
  assert prefix == expected_prefix
print("Test Case 1")
infix = "(a+b)*(c+d)"
expected_prefix = "*+ab+cd"
test_infix_to_prefix(infix, expected_prefix)

print("Test Case 2")
infix = "x+y*z/w+u"
expected_prefix = "++x/*yzwu"
test_infix_to_prefix(infix, expected_prefix)

print("Test Case 3")
infix = "1+2*3-4"
expected_prefix = "-+1*234"
test_infix_to_prefix(infix, expected_prefix)

print("Test Case 4")
infix = "a+b*c-d/e"
expected_prefix = "-+a*bc/de"
test_infix_to_prefix(infix, expected_prefix)


Test Case 1
infix: (a+b)*(c+d)
prefix: *+ab+cd
Test Case 2
infix: x+y*z/w+u
prefix: ++x/*yzwu
Test Case 3
infix: 1+2*3-4
prefix: -+1*234
Test Case 4
infix: a+b*c-d/e
prefix: -+a*bc/de

13. Distinct strings with odd and even changes allowed

Given an array of lower case strings, the task is to find the number of strings that are distinct. Two strings are distinct if, on applying the following operations on one string, the second string cannot be formed.

  • A character on the odd index can be swapped with another character on the odd index only.
  • A character on even index can be swapped with another character on even index only.

Input : arr[] = {"abcd", "cbad", "bacd"}

Output : 2

Explanation : The 2nd string can be converted to the 1st by swapping the first and third characters. So there are 2 distinct

MAX_CHAR = 26

def encode_string(string):
    # Initialize two arrays to store the count of even and odd indexed characters for each string
    hash_even = [0] * MAX_CHAR
    hash_odd = [0] * MAX_CHAR

    # Create a hash for each string
    for i in range(len(string)):
        c = string[i]
        if i % 2 == 0:
            # If the index of the current character is even, increment the count of even indexed characters
            hash_even[ord(c) - ord('a')] += 1

        else:
            # If the index of the current character is odd, increment the count of odd indexed characters
            hash_odd[ord(c) - ord('a')] += 1

    # Store the counts of even and odd indexed characters for each string in a single string, separated by '-'
    encoding = '-'.join(str(hash_even[i]) + '-' + str(hash_odd[i]) for i in range(MAX_CHAR))

    return encoding

# This function uses a hashing based set to store strings that are distinct according to the criteria given in the question.
def count_distinct(input_strings):
    count_distinct = 0 # Initialize result
    n = len(input_strings)

    # Create an empty set and store all distinct strings in it
    string_set = set()

    for i in range(n):
        # If this encoding appears for the first time, increment the count of distinct encodings.
        if encode_string(input_strings[i]) not in string_set:
            string_set.add(encode_string(input_strings[i]))
            count_distinct += 1

    return count_distinct

Explanation:

The encode_string function takes a string as input and creates a hash for that string by counting the number of even and odd indexed characters in the string. It then joins the counts into a single string with "-" as a separator and returns the resulting string.

  • First, the function initializes two arrays hash_even and hash_odd, both of size MAX_CHAR (which is 26 in this case - number of english character). These arrays will be used to store the count of even and odd indexed characters for each string.

  • The function then loops through each character of the input string using a for loop. For each character, it checks whether its index is even or odd using the modulus operator (% 2). If the index is even, it increments the count of the corresponding character in hash_even. Otherwise, it increments the count of the corresponding character in hash_odd. This process creates a count of even and odd indexed characters for each string.
  • Next, the function creates a single string encoding for each input string. It concatenates the counts of even and odd indexed characters for each character in the string, separated by a ‘-‘ character. This is done using a list comprehension that iterates over the range of MAX_CHAR, and for each character in the range, concatenates the count of even indexed characters, a ‘-‘, and the count of odd indexed characters. The resulting strings are then joined together with ‘-‘ separators using the join() method.

for example let’s take the string abcd and cbad:

encode_string("abcd")

hash_even = [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

hash_odd = [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

then final output is:

'1-0-0-1-1-0-0-1-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0'

encode_string("abcd")

hash_even = [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

hash_odd = [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

then final output is:

'1-0-0-1-1-0-0-1-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0'

and these to are encoded as same string so we can say that they are not distinct.

Finally, the count_distinct function takes a list of strings as input and initializes a count of distinct strings to zero. It then creates an empty set to store all the distinct strings. For each string in the list, it generates a hash using the encode_string function and checks if the hash is already in the set of distinct strings. If the hash is not in the set, it adds the hash to the set and increments the count of distinct strings. Finally, it returns the count of distinct strings.

Time and space complexity:

The time complexity of the encode_string function is O(n), where n is the length of the input string. This is because the function iterates through each character in the input string exactly once.

The space complexity of the encode_string function is O(1) because it uses two arrays of fixed length (MAX_CHAR = 26) to store the counts of even and odd indexed characters, and creates a single string of fixed length (52) to store the encoding. Therefore, the space used by the function is constant with respect to the length of the input string.

The time complexity of the count_distinct function is O(n*m) in the worst case, where m is the number of input strings and n is the length of the input string. This is because for each input string, the function calls the encode_string function, which takes O(n) time, and then checks if the resulting encoding is already in the set of distinct encodings, which takes O(m) time in the worst case (when all encodings are distinct). Since there are m input strings and each one can take up to O(n) time, the overall time complexity of the function is O(n*m).

The space complexity of the count_distinct function is O(m) in the worst case. This is because the function uses a set to store all distinct encodings, which can have up to m elements if all input strings are distinct, and each encoding can have length up to 52 (the length of the encoding string returned by encode_string). \

test:

def test_count_distinct(input_strings, expected_count_distinct):
    print("input_strings:", input_strings)
    count = count_distinct(input_strings)
    print("count_distinct:", count)
    assert count == expected_count_distinct

print("Test Case 1")
input_strings = ['abcd', 'cdab', 'bacd', 'bcda', 'abcd']
expected_count_distinct = 3
test_count_distinct(input_strings, expected_count_distinct)


print("Test Case 2")
input_strings = ['aaa', 'aaa', 'aaa', 'aaa', 'aaa']
expected_count_distinct = 1
test_count_distinct(input_strings, expected_count_distinct)

print("Test Case 3")
input_strings = ['abc', 'def', 'ghi', 'jkl']
expected_count_distinct = 4
test_count_distinct(input_strings, expected_count_distinct)

print("Test Case 4")
input_strings = ['aabbcc', 'abcabc', 'acbabc', 'abccba']
expected_count_distinct = 2
test_count_distinct(input_strings, expected_count_distinct)


Test Case 1
input_strings: ['abcd', 'cdab', 'bacd', 'bcda', 'abcd']
count_distinct: 3
Test Case 2
input_strings: ['aaa', 'aaa', 'aaa', 'aaa', 'aaa']
count_distinct: 1
Test Case 3
input_strings: ['abc', 'def', 'ghi', 'jkl']
count_distinct: 4
Test Case 4
input_strings: ['aabbcc', 'abcabc', 'acbabc', 'abccba']
count_distinct: 2

14. Knuth-Morris-Pratt (KMP) Algorithm

The Knuth-Morris-Pratt (KMP) algorithm is used to find all occurrences of a pattern (substring) in a string (text) efficiently. It was invented by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977. The algorithm preprocesses the pattern to create a partial match table, which is then used to perform the matching.

The KMP algorithm works as follows:

  • First, a partial match table (also called failure function) is built for the pattern. This table stores the length of the longest proper prefix of the pattern that is also a proper suffix of the same pattern. The proper prefix of a string is a non-empty prefix that is not equal to the whole string, and the proper suffix is a non-empty suffix that is not equal to the whole string. The table is built in a way that allows it to be used to quickly determine the correct place to resume matching after a mismatch.
  • Then, the string is matched against the pattern using the partial match table. The matching process starts at the beginning of the string and at the beginning of the pattern. At each step, the algorithm compares the current characters in the string and the pattern. If they match, the algorithm moves on to the next character. If they do not match, the algorithm uses the partial match table to determine the correct place to resume matching. Specifically, the algorithm looks up the length of the longest proper prefix of the pattern that is also a proper suffix of the substring of the pattern ending at the previous character. This value is used to determine the next character in the pattern to compare against the current character in the string.
def knuth_morris_pratt_algorithm(string, substring):
    """
    Implementation of the Knuth-Morris-Pratt algorithm to check if a substring exists in a string.
    Returns True if the substring is found, and False otherwise.
    """
    pattern = build_pattern(substring)
    return does_match(string, substring, pattern)


def build_pattern(substring):
    """
    Helper function to build the pattern used in the KMP algorithm.
    Returns a list of integers representing the pattern.
    """
    pattern = [-1] * len(substring)
    j = 0
    i = 1
    while i < len(substring):
        if substring[i] == substring[j]:
            pattern[i] = j
            i += 1
            j += 1
        elif j > 0:
            j = pattern[j - 1] + 1
        else:
            i += 1
    return pattern


def does_match(string, substring, pattern):
    """
    Helper function to check if a substring exists in a string using the pattern from build_pattern().
    Returns True if the substring is found, and False otherwise.
    """
    i = 0
    j = 0
    while i + len(substring) - j <= len(string):
        if string[i] == substring[j]:
            if j == len(substring) - 1:
                return True
            i += 1
            j += 1
        elif j > 0:
            j = pattern[j - 1] + 1
        else:
            i += 1
    return False

Explanation:

This code implements the Knuth-Morris-Pratt algorithm to check if a given substring exists in a given string. The main function is knuth_morris_pratt_algorithm(), which takes two arguments: string and substring. The function returns True if the substring is found in string and False otherwise.

The implementation of KMP algorithm involves two helper functions: build_pattern() and does_match(). The build_pattern() function constructs a pattern for the substring. This pattern is a list of integers that indicates the positions to start matching characters in substring when a mismatch occurs.

The does_match() function takes the string, substring and pattern list as arguments. It then uses these to check if the substring exists in the string using the pattern generated by build_pattern().

The implementation of does_match() function involves two pointers i and j. The i pointer traverses the string while j pointer traverses the substring. When there is a mismatch between string[i] and substring[j], the function uses the pattern list to determine where to start matching characters again. The algorithm continues until either the substring is found in the string or the end of the string is reached.

def test_Knuth_Morris_Pratt_Algorithm(string, substring, expected):
    print("string:", string)
    print("substring:", substring)
    print("Reponse of Function:", knuth_morris_pratt_algorithm(string, substring))
    assert knuth_morris_pratt_algorithm(string, substring) == expected


print("Test Case 1")
string = "aefoaefcdaefcdaed"
substring = "aefcdaed"
expected = True
test_Knuth_Morris_Pratt_Algorithm(string, substring, expected)

print("Test Case 2")
string = "aefoaefcdaefcdaed"
substring = "aefcaefaeiaefaed"
expected = False
test_Knuth_Morris_Pratt_Algorithm(string, substring, expected)

print("Test Case 3")
string = "bccbefbcdabbbcabfdcfe"
substring = "abc"
expected = False
test_Knuth_Morris_Pratt_Algorithm(string, substring, expected)

print("Test Case 4")
string = "bccbefbcdabbbcabfdcfe"
substring = "bcc"
expected = True
test_Knuth_Morris_Pratt_Algorithm(string, substring, expected)
Test Case 1
string: aefoaefcdaefcdaed
substring: aefcdaed
Reponse of Function: True
Test Case 2
string: aefoaefcdaefcdaed
substring: aefcaefaeiaefaed
Reponse of Function: False
Test Case 3
string: bccbefbcdabbbcabfdcfe
substring: abc
Reponse of Function: False
Test Case 4
string: bccbefbcdabbbcabfdcfe
substring: bcc
Reponse of Function: True