Union and intersection of two lists in Scheme in O(n), O(n log(n)) and O(n²)

Hi,

Yet another ‘puzzle’. This time it’s a small algorithm to take the intersection and union of two lists. What beter language to do “LISt Processing” than Scheme!

Task

You are given two linked lists of integers with no duplicates. The intersection of the two lists is a list of integers that are in both lists, and the union of the two lists is a list of integers that are in either list. For instance, if list A contains the integers 4, 7, 12, 6, 17, 5 and 13 and list B contains the integers 7, 19, 4, 11, 13, 2, and 15 their intersection is the list 4, 7, and 13 and their union is the list 4, 7, 12, 6, 17, 5, 13, 19, 11, 2 and 15.

Your task is to write as many different versions of the functions that perform intersection and union as you can think of; you should have versions with time complexities of O(n2), O(n log n) and O(n), and you should perform timing tests to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below

O(n²)

So basically this is the most naive implementation you can think of. We loop over both lists. For each list, if the current element is in the other list, we add it to the intersection. If the element is not yet in the union, we add it to the union. Obviously, this is not very performant. Each call to member will search the entire list. So worst case this is O(n). And we do this for each list’s elements.
So, if we take the length of list 1 to be m , and the length of list 2 n the total time complexity can be calculated as follows:

We have an iteration for each list’s element, but we do them both in a single loop which results in a time complexity of O(max(n, m)).
For each element we search the other list, which takes O(n) and O(m). We also need to check the union- and intersectionlists. This again, takes worst case O(n + m) for the union and O(max(n,m)) for the union.

Ergo, the resulting time complexity is roughly O(n²)

 
; Naive version. O(n²)
(define (naive-union-inter l1 l2)
  (let ((union l1)
        (intersection '()))
    (let eat-lists
      ((first l1)
       (second l2))
      ; Union O(n²)
      (if (not (member (car second) union))
          (set! union (cons (car second) union)))
      ; Intersection
      (if (and (member (car first) l2) (not (member (car first) intersection)))
          (set! intersection (cons (car first) intersection)))
      
      (if (and (member (car second) l1) (not (member (car second) intersection)))
          (set! intersection (cons (car second) intersection)))
      
      (if (not (or (eq? '() (cdr first)) (eq? '() (cdr second))))
          (eat-lists (cdr first) (cdr second))))
    ; Test solution
    (display (equal? (sort union <) (sort correct-union <)))(newline)
    (display (equal? (sort intersection <) (sort correct-inter <)))(newline)))
(naive-union-inter list1 list2)

O(n log(n))

This was by far the easiest solution yet. This time, we sort both lists. Assuming the implementation of sort is in O(n log(n)),  the total time complexity of the sorting is O(n log(n)). Afterwards, we gobble up both lists. If both elements are equal we add them to the intersection and union. If they are different, we add the smallest one to the union and cut it off the list. All these operations are in O(1) time complexity so we can say that we have a total time complexity of O(n log(n)).

(define (less-naive-union l1 l2)
  (let ((sl1  (sort l1 <))
        (sl2  (sort l2 <))
        (union        '())
        (intersection '()))
    (let eat-lists
      ((fst sl1)
       (snd sl2))
      (if (not (and (eq? '() fst) (eq? '() snd)))
      (cond
        ((eq? '() fst)
         (set! union (append snd union)))
        ((eq? '() snd)
         (set! union (append fst union)))
        ((equal? (car fst) (car snd))
         (set! union (cons (car fst) union))
         (set! intersection (cons (car fst) intersection))
         (eat-lists (cdr fst) (cdr snd)))
        ((< (car fst) (car snd))
         (set! union (cons (car fst) union))
         (eat-lists (cdr fst) snd))
        ((> (car fst) (car snd))
         (set! union (cons (car snd) union))
         (eat-lists fst (cdr snd))))))
    
    (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
    (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort intersection <) (sort correct-inter <) (equal? (sort intersection <) (sort correct-inter <)))
    ))

O(n)

This solution required some trickery and is, in constrast with the previous solutions, it requires a working vector. We iterate  over both lists once to determine the minimum and maximum value. From these values we create a vector. With this vector we iterate over both lists again. For each number n in the list we increase the value in the vector on position with 1. This makes sure that elements that are in the intersection will have a value of 2 in the vector, elements that are in the union will then have a value of 1. Elements that are absent will have a value of 0. Then the only thing we need to do is iterate over the vector and add the indices that have a value of 2 to the intersection and add the indices that have a value of 1 to the union. And done.

 

(define (fast-union l1 l2)
  (let ((max     -1)
        (min +inf.0)
        (vec    '())
        (union  '())
        (inter  '()))
    ; Determine maximum and minmum. O(n + m)
    (let walk-lists
      ((first-l  l1)
       (second-l l2))
      (if (> (car first-l) max) 
          (set! max (car first-l)))
      (if (< (car first-l) min)
          (set! min (car first-l)))
      (if (> (car second-l) max) 
          (set! max (car second-l)))
      (if (< (car second-l) min) 
          (set! min (car second-l)))
      (if (not (or (eq? '() (cdr first-l)) 
                   (eq? '() (cdr second-l))))
          (walk-lists (cdr first-l) (cdr second-l))))
    ; Create an array of this size. O(n)
    (set! vec (make-vector (+ 1 (- max min)) 0))
    (printf "Vector: ~a~n" vec)
    ; Loop over each list and mark if it has the element. O(n + m)
    (let ((proc (lambda (x) (vector-set! vec (- x min) (+ 1 (vector-ref vec (- x min)))))))
      (for-each proc l1)
      (for-each proc l2))
    ; Create the union. O(n)
    (vector-map! (lambda (val idx)
                   (if (> val 0)
                       (set! union (cons (+ min idx) union)))
                   val) vec)
    ; Create the intersection O(n)
    (vector-map! (lambda (val idx)
                   (if (equal? 2 val)
                       (set! inter (cons (+ min idx) inter)))
                   val) vec)
    ; Test solution
    (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
    (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort inter <) (sort correct-inter <) (equal? (sort inter <) (sort correct-inter <)))
    ))
Advertisements

The N-Queens Problem Haskell

This is a real toughy! I have to tell you I’ve cheated a bit to get the solution of this one! (It were my first 2 hours of real Haskell programming, so be nice! :))

The problem

A queen in this problem can go infinite steps forwards, backwards and diagonally. We need to place N queens on a NxN board so that no queen can attack another queen. You know how they can get…

This is a problem that is very well suited to be solved with streams. Streams are available in a number of programming languages like Scheme or Haskell. As this was an excericse for my Haskell course I’m obviously going for Haskell here.

The Solution

First step: Data representation

We can represent a list of all possible positions as [[(Int, Int)]] where the first value would be the row and the second one the column. But there is a small trick possible here. We know that the array is indexed from 1 to N. Every column on our board will have exactly one queen (if not there are 2 or more queens on one row and they will fight). This means we can have the column for each queen defined in an implicit manner, namely the array position. So we will represent our queens this way: [[Int]].

Second step: What to filter..?

We will need to generate a list of all possible rowpositions for each column. For a 2×2 board this would be:

[[1,1],[1,2],[2,1],[2,2]]

But we can immediatly see that [2,2] and [1,1] are no valid solutions because queens can attack eachother. This means that every value in the list must be unique between 1 and N. How do we generate these permutations for the list [1..N] ?

generateSolutions' :: Int -> [[Int]]
generateSolutions' 0 = [[]]
generateSolutions' n = permutations [1..n]

Third step: How to filter?

We’ve already taken only the permutations, this means that we never have  a queen on the same row/column. So we only need to filter out the lists that contain queens that are on the same diagonal.

When are two queens on a diagonal?

This happens when the distance between two columns and two rows is the same. You can draw this on a paper if you like. In the image below you can see that the distance between the rows of A and B is 1. The distance between the columns of A and B is 3.

To determine wether these queens are on a diagonal or not we can do just that:

sameDiagonal (a,b) (x,y) = abs (a - x) == abs (b - y)

But this means we we need to pass couples into the function, which we don’t have. We only have rowindices. So what we can do here is pass the first queen as a paramter and zip the rest of the list with [1..]. This gives us the column distance right away! So conceptually speaking we would get a list [(columnDistance, rowIndex)] for all queens except the head.

duplicateDiagonals q qs = any
                          (\(columnDistance, row) -> abs (row - q) == columnDistance)
                          $ zip [1..] qs

This will first of all zip with the infinite list, and then we will apply the predicate function to each element of that list. It just check wether the row distance between both queens is equal to the columndistance. This is pretty neat, isn’t it?
queens

Putting it all together

Now we have all the pieces we need to filter our list.

generateSolutions' :: Int -> [[Int]]
generateSolutions' 0 = [[]]
generateSolutions' n = permutations [1..n]

duplicateDiagonals :: Int -> [Int] -> Bool
duplicateDiagonals q qs = any (\(columnDistance, row) -> abs (row - q) == columnDistance) $ zip [1..] qs

test :: [Int] -> Bool
test [] = True
test (q:qs) = not (duplicateDiagonals q qs) &&test qs

Thanks for reading, Christophe

Pattern Matching Algorithms

Hi there!

I’ve implemented some pattern matching algorithms in C#. They were part of a course I took at the university I study at. They are therefore hardly optimized for real life usage. They do represent the conceptual idea of the algorithms.

The algorithms I implemented are Knuth-Morris-Pratt, Quicksearch and the brute force method.

Brute force method

The brute force method is quite simple. We align our pattern with the text and every time we have a mismatch we shift our pattern one step to the right. This is very bad performance wise. Worst case we will match every letter of our text with every letter of our pattern, which equals to 0(np*nt). Imagine this for a text of 1m characters and pattern of 100.000 characters. This method is good enough for very small text and patterns but as you’ll see, the complexity of other algorithms is hardly bigger. The code is quite straight forward and given below.

Code:


        public static int BruteForce(string text, string pattern)
        {
            var nt = text.Length;
            var np = pattern.Length;

            var it = 0;
            var ip = 0;

            while (it &lt; (nt - np))             {                 if (ip &gt; (np - 1))
                {
                    return it;
                }
                if (it &gt; (nt - np))
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])
                {
                    ip++;
                }
                else
                {
                    it++;
                    ip = 0;
                }
            }
            return -1;
        }

Knuth Morris Pratt

This algorithm is rather difficult to explain in a simple blog post. So bear with me as I try. The algorithm fully depends on the sigma-function. This function will return the largest prefix of the pattern, that is also a suffix of the part of the pattern that we have already matched. This is quite crucial to understand. Take a look at the image below.

KMP

As you can see we have mismatch at the character ‘x’ and ‘a’. Using the bruteforce method we would just shift our pattern one step to the right. So the first ‘a’ would align with the second ‘b’ of our text. What KMP will do, is calculate the sigmafunction. You can see in the image, a suffix of the last ‘a’ in our pattern is ‘ab’ for example. ‘aab’ is one too but you’ll get it in a minute. This ‘ab’ is also a prefix of our pattern. Meaning, if we shift our pattern just the length of that pre- and suffix, we don’t have to match those characters anymore. Because we know for certain that all the characters before our mismatch matched. So we will shift the pattern as you can see in the image. And this is the idea behind the sigma function. Once you got it, it’s a very simple algorithm.

The code:

        private static int[] SigmaTable(string pattern)
        {
            //We will calculate the longest prefix of 0-&gt;ip that is also a prefix of the pattern

            var np = pattern.Length;
            var sigmatable = new int[np];

            var ip = 2;
            var k = 0;

            while (true)
            {
                if (ip &gt;= np)  //We have shifted out of our pattern, we are done.
                    break;
                if (pattern[k] == pattern[ip - 1])//Our prefix extends
                {
                    sigmatable[ip] = (k + 1);
                    ip++;
                    k++;
                    continue;
                }
                if (k &gt; 0)//We have a mismatch, so we need to see if we have a smaller prefix that we can use
                          //We already calculated this, so just get it.
                {
                    k = sigmatable[ip];
                    continue;
                }
                //We have a mismatch, but K = 0, so we dont have a prefix.
                sigmatable[ip] = 0;
                ip++;
            }
            sigmatable[np - 1] = k;
            sigmatable[0] = -1;
            return sigmatable;
        }

The actual algorithm

 public static int Kmp(string text, string pattern)
        {
            var nt = text.Length;
            var np = pattern.Length;
            var sigmatable = SigmaTable(pattern);

            var it = 0;
            var ip = 0;

            //We will keep looping, untill
            //--&gt;We have found a match (ip &gt; (np -1)
            //--&gt; We have shifted out of our text, not found (it &gt; (nt -np))
            //We have match, step forward in the pattern!
            //We have a mismatch, jump back as much as the sigmatable tells us.

            while (true)
            {
                if (ip &gt; (np - 1))
                {
                    return it;
                }
                if (it &gt; (nt - np))
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])
                {
                    ip++;
                    continue;
                }
                it = (it + (ip - sigmatable[ip]));
                ip = ip &gt; 0 ? sigmatable[ip] : 0;
            }
        }

Quick Search

This is my personal favorite for two reasons: 1) it’s fast and 2) it’s everything but difficult!

This algorithm too, will do some preprocessing, although not as difficult to explain as KMP. What we will do is, for every character in the pattern, store the left most location in the pattern. E.g “abcabcd” will have 1 for a, 2 for b, 3 for c and 7 for d. Now, when we have a mismatch, what we will do is take the first character that comes after the entire pattern in the text. So in our picture above, we would take ‘y’. We then check to see if this character exists in our pattern. If it does not, we don’t even have to try matching, we skip it entirely, i.e shift the beginning of our pattern past that character (see the performance?). If it does exist, we shift our pattern the value of our table to the right and start matching from scratch.
We don’t know if our first characters will match, but we do know there is going to match one. The performance boost this algorithm gets is from the parts where we can skip characters that don’t exist in our pattern.

The code:

        private static ShiftTable ComputeShiftTable(string pattern)
        {
            //We will just calculate the position of every character in the pattern. We then store the most right location.

            int[] shift_table;

            int minAscii = pattern[0];
            int maxAscii = minAscii;

            var idx = 0;
            //First we calculate the lowest and highest ascii value to create our array
            while (idx &lt; pattern.Length)
            {
                minAscii = Math.Min(minAscii, pattern[idx]);
                maxAscii = Math.Max(maxAscii, pattern[idx]);
                idx++;
            }

            shift_table = new int[maxAscii - minAscii + 1];

            //Loop over the table and store our patternlength in it. (If we dont find the character, we have to shift the entire pattern!).
            for (int i = 0; i &lt; shift_table.Length; i++)
                shift_table[i] = pattern.Length + 1;

            //Loop over the pattern again. For every char we update it's position in the array.
            //This way, we always have the leftmost character in the array.
            idx = 0;
            while (idx &lt; pattern.Length)
            {                 
                 int currentascii = pattern[idx];                 
                 shift_table[currentascii - minAscii] = pattern.Length - idx;                 
                 idx++;             
            }             
            return new ShiftTable(shift_table, maxAscii, minAscii, pattern.Length);         
} 

The algorithm

         
public static int QuickSearch(string text, string pattern)         {             
       var nt = text.Length;             
       var np = pattern.Length;             
       var shift = ComputeShiftTable(pattern);             
       var it = 0;            
       var ip = 0;             
       while (true)             {                 
                if (ip &gt; (np - 1))//We shifted out of our pattern, found!
                {
                    return it;
                }
                if (it &gt; (nt - np))//We reached the end of our text with our pattern, not found!
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])//We have a match, continue.
                {
                    ip++;
                    continue;
                }
                //We didnt have match, get the first character next to our pattern, and calculate the shift back.
                var ct = text[(it + np)%nt];

                it = (it + shift.Shift(ct));
                ip = 0;
            }
        }

That’s it!

Christophe,

Basic Sorting Algorithms

Hi,

I’m studying for my master’s degree at the moment and in my spare time I rebuilt some of the basic sorting algorithms I learned. There are many more, but I wrote a few of them to gain some insight and because it’s fun :). The algorithms are far from optimized for real-life usage, but the mechanics are correct.

I built the following algorithms:

  • Bubble sort
  • Insertion sort
  • Merge sort
  • Quicksort
  • Quicksort with median 3
  • Randomized Quicksort

All the algorithms are built in such a way that they can sort any object type. To make this possible I used delegates to pass to the algorithms so they can extract the properties of the object that you need to sort on. To show how it works I’ve added a testing solution. It’s a simple console application that sorts arrays of given size and times how long it takes the algorithm to finish.

The source code can be found on Github, or in the downloads (top navigation bar).

If you have any questions, comments are open! 🙂

,Christophe