LINQ and the AutoCAD .NET API (Part 10)
Optimizations
This is the tenth in a series of posts on LINQ an the AutoCAD .NET API. Here's a complete list of posts in this series.
Introduction
Welcome to part 10 of our series on LINQ and the AutoCAD .NET API. Today we want to look at some performace improvements that we can make.
Optimizing Count()
Let's start with a simple example. Say we want to count the number of entities in the model space. Something like this:
The line that interests us most is line 8. Since the ModelSpace
property is of type IEnumerable<Entity>
and is evaluated in a lazy manner (execution of the enumeration is deferred), the next method in the chain gets called, which is Count()
. This method is a LINQ extension method, the implementation is System.Linq.Enumerable.Count(this IEnumerable source)
in the .NET framework. Let's look at the implementation to get a better understanding what happens:
At first, the enumerable that is passed as an argument (in our case it's the ModelSpace
property which is of type EntityContainer
) is tested for it's type: if it is of type ICollection<TSource>
or ICollection
, the return value is taken from the Count
property.
But this is not the case for us, we have our own implementation of IEnumerable<T>
through our container classes, in this case IEnumerable<T>
. This means that we fall through to line 8, where the single elements get enumerated. This means, that GetEnumerator()
of EntityContainer
's base class ContainerBase
gets called. And this means, that each entity is pulled out of the database and casted to Entity
. And this is a quite expensive procedure, if the only thing we do is to execute count++
(in line 10) for each element.
A better approach would be to replace the implementation of Enumerable.Count()
with our own implementation, which only iterates the ObjectId
s in the table. Something like this:
If we add this method to our ContainerBase
class, we are able to use it as the Count()
method in listing 1. The benefit of this implementation is that we only iterate the table elements (the ObjectId
s), without pulling the actual elements out of the database and casting them.
That's for the TableContainerBase
. But for the DictionaryContainerBase
there's an even better solution:
DBDictionary
has a Count
property, which gives us the number of elements in the dictionary. So if we add this version of Count()
to our DictionaryContainerBase
, we get counting the elements for free! No iteration needed, just returning a value.
Optimizing Contains()
We can use a similar procedure when testing for the existence of an element. System.Linq.Enumerable
's version of Contains(T)
takes an element of type T
as an arguments and tests, if the collection contains the element. But again, this means iterating the table or dictionary and comparing elements. Not very efficient in our case.
We know that an element is in the table or dictionary, if the table or dictionary contains the ObjectId
of the element in question. So again, there's actually no need to pull each element we want to compare out of the database and cast it, it's enough to test for the existence ObjectId.
The good thing is, that SymbolTable
and DBDictionary
already provide methods for testing the existence of an element: SymbolTable
provides Has(ObjectId)
and Has(string)
, DBDictionary
Contains(ObjectId)
and Contains(string)
respectively.
So we can add three very efficient methods to query for elements.
At first, let's look at the implementation for TableContainerBase
:
Nothing special that happens here: we simply get the SymbolTable
and call the apropriate Has()
method.
And the implementation for DictionaryContainerBase
is almost the same:
Wrapping up
We found a nice way to use the count and query mechanisms that are provided by SymbolTable
and DBDictionary
for implementing Count()
and Contains(T)
. This saves a lot of iterating over elements that is actually not needed.
But in general, optimizations can be done for a number of LINQ methods, in particular, all methods that don't need a function of T as an argument. For instance, LINQ's Where()
takes a Func<TSource, bool>
as an argument, which means in our case, that we want to test all elements of a underlying SymbolTable
or DBDictionay
. So all of them have to be pulled out of the database.
In contrast, Skip(int)
takes an integer as an argument, the number of elements that should be skipped. In this case, we don't operate on single elements, we simply change the structure of the collection. Hence, in case of Skip(int)
, we simply skip some ObjectId
s, that's it. No need to get elements from the database.
As an overview, these LINQ methods can be optimized:
Testing/Comparing
bool Contains(T)
bool SequenceEqual(IEnumerable<T>)
Counting elements
int Count()
long LongCount()
Getting single elements
T ElementAt(int)
T ElementAtOrDefault(int)
T Last()
T LastOrDefault()
Manipulating the structure of the collection
IEnumerable<T> Concat(IEnumerable<T>)
IEnumerable<T> Distinct()
IEnumerable<T> Except(IEnumerable<T>)
IEnumerable<T> Intersect(IEnumerable<T>)
IEnumerable<TResult> OfType<TResult>()
IEnumerable<T> Reverse()
IEnumerable<T> Skip(int)
IEnumerable<T> Take(int)
IEnumerable<T> Union(IEnumerable<T>)
But diving deeper into these optimizations is beyond the scope of today's post and a nice topic for another post.