Data structures and the principles behind them are something that every developer understands to some degree. However they are often forgotten. I was guilty of just such a thing recently when trying to write an import for some data from a CSV. The premise was fairly simple and the functions not complicated. However running it was taking 15-20 minutes to import 70,000 rows (with relationships). I was using EF, which carries a bit of baggage around with it regarding performance, so it was tempting to write the issue off as an EF one. However I persevered.
I tried many different things, async programming, pure ado.net SQL commands, Dapper, batch SQL inserts; all with little benefit.
Taking a step back is often the best thing to do in these situations, and proved to be the case here. I was fairly happy with my code, it was nice, concise and SOLID. Reading through it, the algorithm and design was clear.
I read through the code (in pseudo format) with the help of my Rubber Duck 🦆; Do this, do that, search the list, get that, search another list, add that, save. Nothing wrong with that. One down, 69,999 to go. Do this, do that, search the list…. Now I am lazy, like all good programmers should aspire to be. Just reading through that “search the list” a second time, I really couldn’t be bothered. So I set to work to ease my mental burden.
I refactored some of the EF queries into Dictionaries instead of Lists, created some lookups (an oft forgotten data structure in C#) and redid a bit of the logic so I would be querying the dictionaries by key, sometimes with the value from a different Dictionary or Lookup.
I ran it through in my head, and all of a sudden it seemed a lot less exhausting. The “search the list” was now just another “get the value”. This proved to be the case. The whole import now runs in less than a minute.
Sometimes its easy to forget the principles behind all these different tools we are given. Enumerable becomes the go to, as typing
var result = list.Where(x == 1) is easy to understand and reads nicely. It’s also only a tab away thanks to intellisense. The fact that most list searches have a complexity of O(n) goes completely forgotten. Having a Dictionary letterByNumber is a slightly less eloquent option and requires a bit more set up. However
var result = letterByNumber has a complexity of O(1). For a list of 70,000 items, the dictionary is 70,000 times faster.
All of this is normally unnoticeable when dealing with smaller amounts of data day to day and probably why the habit develops. Processors are so fast these days, searching lists of even 1000’s of items can be quick.
The O notation is nice, but I actually found reading out the operation is what made it click for me. If you read “search the list”, every time you see a
Where(), I am sure that you will be surprised at how exhausting some of your code is, even if it’s DRY, SOLID or any other fancy acronym in vogue.