## Tuesday, May 31, 2011

### Dealing with unexpected complexity

I sometimes use this as an interview question for senior programmers, and the real part of the interview question isn't about the code at all.  It's about how people deal with unexpected complexity.  I start the problem off with this seemingly absurdly simple programming test: Please write code to accept two integers, and return the numbers in an array in sorted order.

Sometimes I see code like this:

int[] SortTwo(int i, int j) {
int[] result = new int[] {i, j};
result.sort();
return result;
}

My comment about this is to ask them to remove the sort() call and code everything themselves, after which I'll get something like this:

int[] SortTwo(int i, int j) {
if (i < j)
return new int[] {i, j};

else
return new int[] {j, i};
}

And here's where things get interesting.  I say I'm going to change the problem now to SortThree(), and still insist they not use any external function calls.  So I change the signature to look like this:

int[] SortThree(int i, int j, int k);

and them ask them to write the code to return three numbers sorted in an array.  The interesting thing here is that the code to sort three numbers is "unexpectedly" more complex than the code to sort two numbers.  The real point of this question is to see how a person deals with unexpected complexity.  Here's a solution in Java.

1. Are you proposing your solution for SortThree as a "good" solution? You are scaring the hell out of me, D.

I'll assume that you are MUCH better than that and this was just a five minute post to exemplify the exploding complexity. I'll give you that leeway, but for the information you have provided it appears that you are *at peace* with that inelegant solution.

I'm compelled to either help you or respectfully ask you to stop calling yourself a developer. A mathematician.... you seem to have some aptitude, but if someone is paying you to write code, then I fear that you are of morally questionable character taking money for logic like the few samples I stumbled across on your blog.

"It works" is not an acceptable metric to earn your "I'm a professional developer" badge.

Writing a function with multiple exit points / return statements is a huge faux paux.

There are a dozen ways to sort an array of arbitrary length. Presumably, you are quizzing a entry-level candidate on some of their grasp of theory (i.e. sorting algorithms), but your post is frighteningly devoid of any mention of that class of algorithms. QuickSort, BubbleSort, Factoradics, SwapElements, and on and on...

1. Hi Scott, please allow me to educate you and any other readers who come across this.

Yes, the point of this whole exercise, as I explained in the opening sentence, is to explore unexpected complexity. It's not to find the optimal implementation, which you seemed to get lost in.

You asked about algorithms like QuickSort and BubbleSort, but maybe you missed the part of the question where I said you must write all the code yourself. If you are going to write by yourself a generic QuickSort algorithm to sort exactly 3 numbers, I would give you a fail on my test because that's way overkill, and would run slower than what I showed.

Regarding multiple exit points, yes I agree that generally speaking that should be avoided. However in this case, I felt it clarified the code, and I would challenge you to write more clear code that runs as efficiently as what I presented.

2. Another advantage of this simple algorithm is that you expect to have 3! = 6 permutations and there are 6 branches of code. So that gives you a simple sanity check on the implementation.

I would be tempted to try to "halve" the 6! down to the first 3 branches and a recursive call in the second outer branch: "else return SortThree(j, i, k);"

But does this break the restriction on *external* function calls?

Regardless, even in this simple case it increases the risk of a careless bug - one would need to change the first condition to be "if (i <= j)" otherwise it would lead to a stack overflow when i == j.

I would be tempted to use "<=" instead of "<" to reduce the risk of accidental exclusions... the principle being that the deliberate conditions should be as wide as possible and the default case as narrow as possible - but since each of the branches has a catch-all else clause, this is being a bit too anal-ytical for such a simple example).

I like how you have commented each branch with its condition, making it easier not to get confused.

Another structural sanity check which is easy with this code layout, is to make sure that each symbol is repeated twice in each position in the 6 arrays, and that each array has all 3 symbols.

With the recursive call, there is less code to get wrong, but you also lose these extra double-checks for correctness.

1. Correction: 6 return statements, not 6 branches of code (first sentence above).

2. Thanks Andrew for your comments and analysis. As I've asked this question over the years, most people (not you) have generally rejected this solution. Instead, they talk about quicksort or other sorting algorithms, or they write code that contains one or more temporary heap allocations which would obviously be significantly slower.

In short, most people can't seem to fully appreciate what their language actually compiles to, or wrap their head around the idea of writing simple, elegant, fast code that is not over kill. I've challenged people to write more efficient implementations than this and have yet to find one.

Thanks again!

3. Fully agreed.. sometimes pre built classes or functions are performance over head, in such cases our own problem specific class or function is much better. ;)