Wednesday, January 27, 2010

To Optimize, Or Not...

A recent code review spawned a spirited discussion about list creation and whether or not to set an initial size when the number of elements is known.

In Java an array that holds 3 widgets could be create a couple of ways:

List list = new ArrayList();

list.add(new Widget("alpha"));
list.add(new Widget("beta"));
list.add(new Widget("gamma"));

or

List list = new ArrayList(3);

list.add(new Widget("alpha"));
....

The spirited part of the discussing boiled down to the question, "Which way is better?"

The second option certainly appears better from a performance standpoint. It's logical: Letting the code create the list at the exact size needed is faster than a list that has to grow as it is populated.

On the other hand, as the "premature optimization" camp believes, writing code that is easy to understand/extend/modify should be the overriding goal - you worry about optimization when you a have a specific, quantified problem - and the second option creates one more thing for a dev to keep in mind as he works the code. That is, six months from now when a new entry is added to the list the initial size will have to be updated. Any bets on the chance that the update will be missed?

And then there's the unknown: What does the specific compiler and/or the underlying execution do with the statement
"List list = new ArrayList(3)?"
Could there be some built in optimization that would be overridden by setting an initial list size?

I was pretty vocal about the need to avoid premature optimization but I didn't want to just pontificate. I decided to get some metrics on the specific case we reviewed and the result was that using an unsized list took just a fraction of a second (6/100ths) longer.

I then ran 4 sets of additional tests comparing how long it takes to add simple objects to an unsized list and a sized list. The tests used 1 thousand, 10 thousand, 100 thousand and 1 million elements.

The results showed that until the element count reached 100 thousand there was no time difference. And at 1 million elements the typical time difference was only 120 milliseconds:

140 milliseconds to add to an unsized list
20 milliseconds to add to a list with an initial size of 1000000

In other words, there was only about 1/10th of a second difference when dealing with 1 million elements. An interesting discovery is that it took 2/100ths of a second longer to create the sized list.

So, which way is better? As always, the answer is "it depends!" This example is just a nit, but design/coding decisions have a cumulative effect so it's important to be conscious of the trade offs.