Estimation, The second kind of

July 19, 2009 19:46
We developers are often asked how long it would take to finish a piece of code.  We've become quite skilled at that. We look at the scope, get a rough estimation, adjust it to have time on any issues we might face, add up time for comunication, add coffee breaks, multiply the result by 3 and let business to decide whether they still want the feature.

What we are asked much less often is
how long it would take to run a piece of code.

Why is that important?

Your code might take a second to process 100 items but will run for hours if you feed it with 500. It's a lot easier to address the issue while implementing the algorithm, rather than fixing a critical performance bug for the already deployed solution.

It's difficult to predict how big the input can be.  But quite often all you need is a quick confirmation that the your solution is fast enough, without plodding away at writing performance tests. It's not that performance tests are bad - indeed you must have them. I'm just saying there are performance problems that can be easily noticed without any tests.

The Big O notation[1] is your friend here.

The O() notation.

We think about it as an "order of". Imagine your method processes 1 item in 1 second. How fast will it process 50 items?

If you're lucky to have a O(1) algorithm, it will still take 1 second. If the code is
O(n) then have a quick chat with a friend but get back in 50 seconds. If your method is O(n2) you might want to go to lunch - the results will be available in some 40 minutes. But if it's O(2n) make sure you have something to do for the next 35702051 years.

Below is a picture with several most common complexities.

Common Sense Estimation.

A nice part of all this is that you don't need to use formal mathematic methods to find out the complexity of your algorithm
. Rely on common sense[2].

Simple access: O(1).
If you access an element in an array or hashtable, your algorithm is
O(1). Why? Arrays are stored contiguously in memory, so accessing an element by index is a matter of putting together index number and base address of the array. Roughly the same is for hash tables - the hash you specify identifies the busket where you'll find the value.

Simple loops: O(n). 
If you simply iterate through a list
, your algorithm is O(n). Why? Because the more elements you add to the list, the more iterations you have to do.

Nested loops: O(n2). If  you iterate through a list and within each iteration go through another list, you get a  O(n2) algorithm. Why? Each nested loop adds up one O(n) per each iteration of the outer loop. Thus, you have O(n1)*O(n2) where n1 and n2 are amounts of elements in the lists but as we're talking about very big values of n, we can say n1==n2.

Binary chop: O(lg(n)). 
If you chop the list in two halves and work with one half only - which you will chop again and so on - you'd get a
O(lg(n)) algorithm. Why? Each iteration halves the size of the array. Thus, the formula for array size is n = 2iterationCount and if you swap the equation and take logs of each side of this equation you'll get lg(2iterationCount) = lg(n). By the definition of logs it's iterationCount = lg(n).

Divide and conquer: O(n*lg(n)).
If you use a multi branched recursion (divide the list in two or more parts, work on each independently and then combine the results), you'll probably get
O(n*lg(n)) algorithm. Why? Each time you divide the list into two pieces. Thus, you can make O(lg(n)) nested calls on each piece before reaching a sublist with only one element. On each step, you need O(n) calls to merge the results. To get the overall complexity, multiply both. [Update: better explanation here]



Footnotes.
  1. Here go wikipedia articles -  The Big O and computational complexity.
  2. Just found that Algorithm Speed chapter from The Pragmatic Programmer is available online.

Comments are closed