“Late self binding” and “static super binding”

Today in my course of object oriented programming languages I ran into “late self binding”.

First of all we all know that when a class “Foo” is the super class for a class “Bar”, every method of Foo is callable from within the class “Bar”. This is the whole point of inheritance. So suppose we have the following:

public class A {
	public void printMsg1()
	{
		System.out.println("A>>msg1");
	}
}

public class B extends A {
	public void printMsg2()
	{
		System.out.println("B>>msg2");
		super.printMsg2();
	}
}

We know that when we have an instance of class A, and we call a.printMsg1() we know that we will get “A>>msg1” as output in the console. And if we call b.prinMsg2() we will get “B>>msg2” followed by “A>>msg1”. The second message comes from our class A, which is obviously the parent of our class B, i.e. the super class. Now let’s use a third class, C.

public class A {
    public void printMsg2()
    {
        System.out.println("A>>msg1");
    }
}
public class B extends A {
    public void printMsg1()
    {
        System.out.println("B>>msg2");
    }
}
public class C extends B {
    public void printMsg1()
    {
        System.out.println("C>>msg1");
        super.printMsg1();
    }
}

We can clearly see that our super class B does not have a method named printMsg1(), so what happens? The compiler will look at compile-time a level up in our inheritance tree and will find a method named like that in our class A and use that one. So the output of a call to an instance of class C it’s method c.printMsg1() will give us “C>>msg1” followed by “A>>msg1”. If we were to delete the printMsg1() procedure from class A and insert it in class B, everything would still work just fine because our call to super with find the method we are looking for in the class B, which it will check before class A, since it’s our direct parent.

This entire process is called static super binding. This means that at compile-time the compiler knows in which class it will have to look for a call to super.

Now comes the tricky part, late self binding.

public class A {
	public void printMsg1()
	{
		System.out.println("A>>msg1");
		this.printMsg2();
	}
	public void printMsg2()
	{
		System.out.println("A>>msg2");
	}
}

public class B extends A {
	public void printMsg1()
	{
		System.out.println("B>>msg1");
		super.printMsg1();
	}
	public void printMsg2()
	{
		System.out.println("B>>msg2");
	}
}

I don’t know about you guys, but I never fully grasped the usage of this.

On an instance of B, when we call b.printMsg1() we will get the output “B>>msg1” and then we call our super and ask it to execute printMsg1(). This will give us the output “A>>msg1”. Then we call this and ask it to execute printMsg2().

The actual this is the calling class instance. This is obviously our instance of B. So this does not point to the current enclosing class (A), but to the enclosing class that initially called the first method on the bottom of our call stack. In our case this is b.printMsg1(). If we look in our class B we will see that we indeed have a method named b.printMsg2(). So we get the output “B>>msg2”.

So what happens if we don’t have that method (b.printMsg1()) in our child class? Simple. If you consider the line this.printMsg2() as a call to our initial caller, the B class, you know that in case the method is not present we will look for it in the super class.

Why the name late self binding then? Well, this stems from the fact that at compile-time you don’t know which method of which class will be at the beginning of the call stack. So therefore the compiler cannot tell then which method from which class needs to be invoked.

The super can always be determined at compile-time. This is because a class has a unique super (with multiple inheritance this is no longer true) and can therefore be identified. The reason for this is that you can not change your super class at run-time. Say a class B extending class A can never be modified during run-time to extend class C instead of class A.

Hope this helps somebody,

Christophe

PS: I know that I’ve used some terminology that might be incorrect, but I do think the concept is clear and outlined correctly. If you have any remarks I’d be glad to hear them in the comments.

Advertisements

Pattern Matching Algorithms

Hi there!

I’ve implemented some pattern matching algorithms in C#. They were part of a course I took at the university I study at. They are therefore hardly optimized for real life usage. They do represent the conceptual idea of the algorithms.

The algorithms I implemented are Knuth-Morris-Pratt, Quicksearch and the brute force method.

Brute force method

The brute force method is quite simple. We align our pattern with the text and every time we have a mismatch we shift our pattern one step to the right. This is very bad performance wise. Worst case we will match every letter of our text with every letter of our pattern, which equals to 0(np*nt). Imagine this for a text of 1m characters and pattern of 100.000 characters. This method is good enough for very small text and patterns but as you’ll see, the complexity of other algorithms is hardly bigger. The code is quite straight forward and given below.

Code:


        public static int BruteForce(string text, string pattern)
        {
            var nt = text.Length;
            var np = pattern.Length;

            var it = 0;
            var ip = 0;

            while (it < (nt - np))             {                 if (ip > (np - 1))
                {
                    return it;
                }
                if (it > (nt - np))
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])
                {
                    ip++;
                }
                else
                {
                    it++;
                    ip = 0;
                }
            }
            return -1;
        }

Knuth Morris Pratt

This algorithm is rather difficult to explain in a simple blog post. So bear with me as I try. The algorithm fully depends on the sigma-function. This function will return the largest prefix of the pattern, that is also a suffix of the part of the pattern that we have already matched. This is quite crucial to understand. Take a look at the image below.

KMP

As you can see we have mismatch at the character ‘x’ and ‘a’. Using the bruteforce method we would just shift our pattern one step to the right. So the first ‘a’ would align with the second ‘b’ of our text. What KMP will do, is calculate the sigmafunction. You can see in the image, a suffix of the last ‘a’ in our pattern is ‘ab’ for example. ‘aab’ is one too but you’ll get it in a minute. This ‘ab’ is also a prefix of our pattern. Meaning, if we shift our pattern just the length of that pre- and suffix, we don’t have to match those characters anymore. Because we know for certain that all the characters before our mismatch matched. So we will shift the pattern as you can see in the image. And this is the idea behind the sigma function. Once you got it, it’s a very simple algorithm.

The code:

        private static int[] SigmaTable(string pattern)
        {
            //We will calculate the longest prefix of 0->ip that is also a prefix of the pattern

            var np = pattern.Length;
            var sigmatable = new int[np];

            var ip = 2;
            var k = 0;

            while (true)
            {
                if (ip >= np)  //We have shifted out of our pattern, we are done.
                    break;
                if (pattern[k] == pattern[ip - 1])//Our prefix extends
                {
                    sigmatable[ip] = (k + 1);
                    ip++;
                    k++;
                    continue;
                }
                if (k > 0)//We have a mismatch, so we need to see if we have a smaller prefix that we can use
                          //We already calculated this, so just get it.
                {
                    k = sigmatable[ip];
                    continue;
                }
                //We have a mismatch, but K = 0, so we dont have a prefix.
                sigmatable[ip] = 0;
                ip++;
            }
            sigmatable[np - 1] = k;
            sigmatable[0] = -1;
            return sigmatable;
        }

The actual algorithm

 public static int Kmp(string text, string pattern)
        {
            var nt = text.Length;
            var np = pattern.Length;
            var sigmatable = SigmaTable(pattern);

            var it = 0;
            var ip = 0;

            //We will keep looping, untill
            //-->We have found a match (ip > (np -1)
            //--> We have shifted out of our text, not found (it > (nt -np))
            //We have match, step forward in the pattern!
            //We have a mismatch, jump back as much as the sigmatable tells us.

            while (true)
            {
                if (ip > (np - 1))
                {
                    return it;
                }
                if (it > (nt - np))
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])
                {
                    ip++;
                    continue;
                }
                it = (it + (ip - sigmatable[ip]));
                ip = ip > 0 ? sigmatable[ip] : 0;
            }
        }

Quick Search

This is my personal favorite for two reasons: 1) it’s fast and 2) it’s everything but difficult!

This algorithm too, will do some preprocessing, although not as difficult to explain as KMP. What we will do is, for every character in the pattern, store the left most location in the pattern. E.g “abcabcd” will have 1 for a, 2 for b, 3 for c and 7 for d. Now, when we have a mismatch, what we will do is take the first character that comes after the entire pattern in the text. So in our picture above, we would take ‘y’. We then check to see if this character exists in our pattern. If it does not, we don’t even have to try matching, we skip it entirely, i.e shift the beginning of our pattern past that character (see the performance?). If it does exist, we shift our pattern the value of our table to the right and start matching from scratch.
We don’t know if our first characters will match, but we do know there is going to match one. The performance boost this algorithm gets is from the parts where we can skip characters that don’t exist in our pattern.

The code:

        private static ShiftTable ComputeShiftTable(string pattern)
        {
            //We will just calculate the position of every character in the pattern. We then store the most right location.

            int[] shift_table;

            int minAscii = pattern[0];
            int maxAscii = minAscii;

            var idx = 0;
            //First we calculate the lowest and highest ascii value to create our array
            while (idx < pattern.Length)
            {
                minAscii = Math.Min(minAscii, pattern[idx]);
                maxAscii = Math.Max(maxAscii, pattern[idx]);
                idx++;
            }

            shift_table = new int[maxAscii - minAscii + 1];

            //Loop over the table and store our patternlength in it. (If we dont find the character, we have to shift the entire pattern!).
            for (int i = 0; i < shift_table.Length; i++)
                shift_table[i] = pattern.Length + 1;

            //Loop over the pattern again. For every char we update it's position in the array.
            //This way, we always have the leftmost character in the array.
            idx = 0;
            while (idx < pattern.Length)
            {                 
                 int currentascii = pattern[idx];                 
                 shift_table[currentascii - minAscii] = pattern.Length - idx;                 
                 idx++;             
            }             
            return new ShiftTable(shift_table, maxAscii, minAscii, pattern.Length);         
} 

The algorithm

         
public static int QuickSearch(string text, string pattern)         {             
       var nt = text.Length;             
       var np = pattern.Length;             
       var shift = ComputeShiftTable(pattern);             
       var it = 0;            
       var ip = 0;             
       while (true)             {                 
                if (ip > (np - 1))//We shifted out of our pattern, found!
                {
                    return it;
                }
                if (it > (nt - np))//We reached the end of our text with our pattern, not found!
                {
                    return -1;
                }
                if (text[it + ip] == pattern[ip])//We have a match, continue.
                {
                    ip++;
                    continue;
                }
                //We didnt have match, get the first character next to our pattern, and calculate the shift back.
                var ct = text[(it + np)%nt];

                it = (it + shift.Shift(ct));
                ip = 0;
            }
        }

That’s it!

Christophe,