Union and intersection of two lists in Scheme in O(n), O(n log(n)) and O(n²)

Hi,

Yet another ‘puzzle’. This time it’s a small algorithm to take the intersection and union of two lists. What beter language to do “LISt Processing” than Scheme!

Task

You are given two linked lists of integers with no duplicates. The intersection of the two lists is a list of integers that are in both lists, and the union of the two lists is a list of integers that are in either list. For instance, if list A contains the integers 4, 7, 12, 6, 17, 5 and 13 and list B contains the integers 7, 19, 4, 11, 13, 2, and 15 their intersection is the list 4, 7, and 13 and their union is the list 4, 7, 12, 6, 17, 5, 13, 19, 11, 2 and 15.

Your task is to write as many different versions of the functions that perform intersection and union as you can think of; you should have versions with time complexities of O(n2), O(n log n) and O(n), and you should perform timing tests to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below

O(n²)

So basically this is the most naive implementation you can think of. We loop over both lists. For each list, if the current element is in the other list, we add it to the intersection. If the element is not yet in the union, we add it to the union. Obviously, this is not very performant. Each call to member will search the entire list. So worst case this is O(n). And we do this for each list’s elements.
So, if we take the length of list 1 to be m , and the length of list 2 n the total time complexity can be calculated as follows:

We have an iteration for each list’s element, but we do them both in a single loop which results in a time complexity of O(max(n, m)).
For each element we search the other list, which takes O(n) and O(m). We also need to check the union- and intersectionlists. This again, takes worst case O(n + m) for the union and O(max(n,m)) for the union.

Ergo, the resulting time complexity is roughly O(n²)

 
; Naive version. O(n²)
(define (naive-union-inter l1 l2)
  (let ((union l1)
        (intersection '()))
    (let eat-lists
      ((first l1)
       (second l2))
      ; Union O(n²)
      (if (not (member (car second) union))
          (set! union (cons (car second) union)))
      ; Intersection
      (if (and (member (car first) l2) (not (member (car first) intersection)))
          (set! intersection (cons (car first) intersection)))
      
      (if (and (member (car second) l1) (not (member (car second) intersection)))
          (set! intersection (cons (car second) intersection)))
      
      (if (not (or (eq? '() (cdr first)) (eq? '() (cdr second))))
          (eat-lists (cdr first) (cdr second))))
    ; Test solution
    (display (equal? (sort union <) (sort correct-union <)))(newline)
    (display (equal? (sort intersection <) (sort correct-inter <)))(newline)))
(naive-union-inter list1 list2)

O(n log(n))

This was by far the easiest solution yet. This time, we sort both lists. Assuming the implementation of sort is in O(n log(n)),  the total time complexity of the sorting is O(n log(n)). Afterwards, we gobble up both lists. If both elements are equal we add them to the intersection and union. If they are different, we add the smallest one to the union and cut it off the list. All these operations are in O(1) time complexity so we can say that we have a total time complexity of O(n log(n)).

(define (less-naive-union l1 l2)
  (let ((sl1  (sort l1 <))
        (sl2  (sort l2 <))
        (union        '())
        (intersection '()))
    (let eat-lists
      ((fst sl1)
       (snd sl2))
      (if (not (and (eq? '() fst) (eq? '() snd)))
      (cond
        ((eq? '() fst)
         (set! union (append snd union)))
        ((eq? '() snd)
         (set! union (append fst union)))
        ((equal? (car fst) (car snd))
         (set! union (cons (car fst) union))
         (set! intersection (cons (car fst) intersection))
         (eat-lists (cdr fst) (cdr snd)))
        ((< (car fst) (car snd))
         (set! union (cons (car fst) union))
         (eat-lists (cdr fst) snd))
        ((> (car fst) (car snd))
         (set! union (cons (car snd) union))
         (eat-lists fst (cdr snd))))))
    
    (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
    (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort intersection <) (sort correct-inter <) (equal? (sort intersection <) (sort correct-inter <)))
    ))

O(n)

This solution required some trickery and is, in constrast with the previous solutions, it requires a working vector. We iterate  over both lists once to determine the minimum and maximum value. From these values we create a vector. With this vector we iterate over both lists again. For each number n in the list we increase the value in the vector on position with 1. This makes sure that elements that are in the intersection will have a value of 2 in the vector, elements that are in the union will then have a value of 1. Elements that are absent will have a value of 0. Then the only thing we need to do is iterate over the vector and add the indices that have a value of 2 to the intersection and add the indices that have a value of 1 to the union. And done.

 

(define (fast-union l1 l2)
  (let ((max     -1)
        (min +inf.0)
        (vec    '())
        (union  '())
        (inter  '()))
    ; Determine maximum and minmum. O(n + m)
    (let walk-lists
      ((first-l  l1)
       (second-l l2))
      (if (> (car first-l) max) 
          (set! max (car first-l)))
      (if (< (car first-l) min)
          (set! min (car first-l)))
      (if (> (car second-l) max) 
          (set! max (car second-l)))
      (if (< (car second-l) min) 
          (set! min (car second-l)))
      (if (not (or (eq? '() (cdr first-l)) 
                   (eq? '() (cdr second-l))))
          (walk-lists (cdr first-l) (cdr second-l))))
    ; Create an array of this size. O(n)
    (set! vec (make-vector (+ 1 (- max min)) 0))
    (printf "Vector: ~a~n" vec)
    ; Loop over each list and mark if it has the element. O(n + m)
    (let ((proc (lambda (x) (vector-set! vec (- x min) (+ 1 (vector-ref vec (- x min)))))))
      (for-each proc l1)
      (for-each proc l2))
    ; Create the union. O(n)
    (vector-map! (lambda (val idx)
                   (if (> val 0)
                       (set! union (cons (+ min idx) union)))
                   val) vec)
    ; Create the intersection O(n)
    (vector-map! (lambda (val idx)
                   (if (equal? 2 val)
                       (set! inter (cons (+ min idx) inter)))
                   val) vec)
    ; Test solution
    (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
    (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort inter <) (sort correct-inter <) (equal? (sort inter <) (sort correct-inter <)))
    ))
Advertisements

2 thoughts on “Union and intersection of two lists in Scheme in O(n), O(n log(n)) and O(n²)

  1. Your O(n) solution is not O(n). The step where you create the union/intersection from the vector is O(maxvalue – minvalue) since you are mapping over a vector of that length. For example, making the union of (1 2) and (1 1000000000) will make a gigantic vector you will then loop through.

  2. Ward makes a valid point, but your solution is on the right track. You just need to store whether a number occurs in the first list or not (you can use a bit vector to save some space). If you then iterate the second list you should consult the vector for each number in that list: if you are building the intersection, take all numbers from the second list that occurred previously; if you are building the union, add all numbers that did not occur previously and add them to the first list. Assuming vector access O(1) [obviously] and vector creation O(1) [less obvious, but acceptable], this is a true O(N) time solution (you never actually iterate the whole vector), with N the combined length of the input lists.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s