Enums turmoil

May 3, 2009 20:12

Do you think enums are simple? Well, they are not. Look at the following code

public enum Price
{
    None,
    Cheap,
    Inexpensive = Cheap
}
 
public static void Main()
{
    Console.WriteLine(Price.Inexpensive);
}


Ok so I hear you saying, "It will print Cheap". Cool stuff. You're right! If multiple enum members share the same value, the "main" member acts as a source for ToString. Now we move on - I will just add a few more members.

public enum Price
{
    None,
    Cheap,
    Inexpensive = Cheap,
    Moderate,
    Costly
}
 
public static void Main()
{
    Console.WriteLine(Price.Inexpensive);
}


Hmm, now you're probably reluctant to say it will print Cheap again, since your guts tell it's too easy to be true. Absolutely.

Because now it will print Inexpensive.

How come?!

At this point you might want to open up Visual Studio and test this. No problem, this post will be here until you're done. You can take solace in the fact that I was flummoxed as well, and had to debug .NET source code to understand the magic.

It turns out that internally an enum stores its member names and their correspondent values in a special hashtable called HashEntry.


When you call ToString on an enum member, it all comes down to a call to InternalGetValueAsString in Enum.cs. Here's the most interesting part of the method:

ulong val = ToUInt64(value);
int index = BinarySearch(hashEntry.values, val); 
 
if (index >= 0)
    return hashEntry.names[index];

 
It uses binary search to perform a lookup (
the array should be sorted but that's always true for enums).

Technically, that one is a bit different from what you use when you call Array.BinarySearch but the result is the same. If you binary search for "B" in ["A", "B", "B", "C", "D"] array, resulting index will be 2. The second "B".

Binary search trap.

Neither Enum's nor Array's binary search implementation actually handles duplicates. Each starts from the middle of the interval, decides whether the value in question should be in the left or right sub-interval, takes that span and repeats until the value is found (or not found).
 
That is why we get Inexpensive: we have a dupe and binary search returns the wrong tween member.

How the algorithm could have been fixed to return the leftmost dupe? Here's my take:

internal static int BinarySearch(ulong[] array, ulong value)
{
    int lo = 0;
    int hi = array.Length - 1;
 
    while (lo <= hi) {
        int i = (lo + hi) >> 1;
        ulong temp = array[i];
 
        if (value == temp) {
            if (i > 0) {
                ulong leftTemp = array[i - 1];
                if (value == leftTemp) {
                    temp++;
                }
                else {
                    return i;
                }   
            }
            else {
                return i;
            }
        }
 
        if (temp < value) {
            lo = i + 1;
        }
        else {
            hi = i - 1;
        }
    }
 
    return ~lo;
}


Bottom Line.

The bottom line is simple. Now, as you know what's going on behind the scenes, avoid member value sharing in enums - unless you really really need that. And if you need that, a never ending trail of subtle bugs is waiting for you.

So think again :)


Comments

Comments are closed