LINQ and the AutoCAD .NET API (Part 10)

Optimizations

This is the tenth in a series of posts on LINQ an the AutoCAD .NET API. Here's a complete list of posts in this series.

Introduction

Welcome to part 10 of our series on LINQ and the AutoCAD .NET API. Today we want to look at some performace improvements that we can make.


Optimizing Count()

Let's start with a simple example. Say we want to count the number of entities in the model space. Something like this:


[CommandMethod("CountEntities")]
public static void CountEntities()
{ 
  var editor = Application.DocumentManager.MdiActiveDocument.Editor;
   
  using (var db = AcadDatabase.FromActiveDocument())
  {
    int count = db.ModelSpace.Count();
    editor.WriteMessage($"{Environment.NewLine}There are {count } entities in the model space");
  }
}

Listing 1: CountEntities

The line that interests us most is line 8. Since the ModelSpace property is of type IEnumerable<Entity> and is evaluated in a lazy manner (execution of the enumeration is deferred), the next method in the chain gets called, which is Count(). This method is a LINQ extension method, the implementation is System.Linq.Enumerable.Count(this IEnumerable source) in the .NET framework. Let's look at the implementation to get a better understanding what happens:


public static int Count<TSource>(this IEnumerable<TSource> source) {
    if (source == null) throw Error.ArgumentNull("source");
    ICollection<TSource> collectionoft = source as ICollection<TSource>;
    if (collectionoft != null) return collectionoft.Count;
    ICollection collection = source as ICollection;
    if (collection != null) return collection.Count;
    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator()) {
        checked {
            while (e.MoveNext()) count++;
        }
    }
    return count;
}

Listing 2: Enumerable.Count()

At first, the enumerable that is passed as an argument (in our case it's the ModelSpace property which is of type EntityContainer) is tested for it's type: if it is of type ICollection<TSource> or ICollection, the return value is taken from the Count property.

But this is not the case for us, we have our own implementation of IEnumerable<T> through our container classes, in this case IEnumerable<T>. This means that we fall through to line 8, where the single elements get enumerated. This means, that GetEnumerator() of EntityContainer's base class ContainerBase gets called. And this means, that each entity is pulled out of the database and casted to Entity. And this is a quite expensive procedure, if the only thing we do is to execute count++ (in line 10) for each element.

A better approach would be to replace the implementation of Enumerable.Count() with our own implementation, which only iterates the ObjectIds in the table. Something like this:


public virtual int Count()
{
  var enumerator = ((IEnumerable)transaction.GetObject(containerID, OpenMode.ForRead)).GetEnumerator();
  int count = 0;
 
  while (enumerator.MoveNext())
  {
    count++;
  }
 
  return count;
}

Listing 3: Count

If we add this method to our ContainerBase class, we are able to use it as the Count() method in listing 1. The benefit of this implementation is that we only iterate the table elements (the ObjectIds), without pulling the actual elements out of the database and casting them.

That's for the TableContainerBase. But for the DictionaryContainerBase there's an even better solution:


public override int Count()
{
  return ((DBDictionary)transaction.GetObject(ID, OpenMode.ForRead)).Count;
}

Listing 4: Count

DBDictionary has a Count property, which gives us the number of elements in the dictionary. So if we add this version of Count() to our DictionaryContainerBase, we get counting the elements for free! No iteration needed, just returning a value.


Optimizing Contains()

We can use a similar procedure when testing for the existence of an element. System.Linq.Enumerable's version of Contains(T) takes an element of type T as an arguments and tests, if the collection contains the element. But again, this means iterating the table or dictionary and comparing elements. Not very efficient in our case.

We know that an element is in the table or dictionary, if the table or dictionary contains the ObjectId of the element in question. So again, there's actually no need to pull each element we want to compare out of the database and cast it, it's enough to test for the existence ObjectId.

The good thing is, that SymbolTable and DBDictionary already provide methods for testing the existence of an element: SymbolTable provides Has(ObjectId) and Has(string), DBDictionary Contains(ObjectId) and Contains(string) respectively.

So we can add three very efficient methods to query for elements.

At first, let's look at the implementation for TableContainerBase:


public bool Contains(T value)
{
  return Contains(value.ObjectId);
}
 
public bool Contains(ObjectId id)
{
  return ((SymbolTable)transaction.GetObject(ID, OpenMode.ForRead)).Has(id);
}
 
public bool Contains(string name)
{
  return ((SymbolTable)transaction.GetObject(ID, OpenMode.ForRead)).Has(name);
}

Listing 5: Contains

Nothing special that happens here: we simply get the SymbolTable and call the apropriate Has() method.

And the implementation for DictionaryContainerBase is almost the same:


public bool Contains(T value)
{
  return Contains(value.ObjectId);
}

public bool Contains(ObjectId id)
{
  return ((DBDictionary)transaction.GetObject(ID, OpenMode.ForRead)).Contains(id);
}

public bool Contains(string name)
{
  return ((DBDictionary)transaction.GetObject(ID, OpenMode.ForRead)).Contains(name);
}

Listing 6: Contains

Wrapping up

We found a nice way to use the count and query mechanisms that are provided by SymbolTable and DBDictionary for implementing Count() and Contains(T). This saves a lot of iterating over elements that is actually not needed.

But in general, optimizations can be done for a number of LINQ methods, in particular, all methods that don't need a function of T as an argument. For instance, LINQ's Where() takes a Func<TSource, bool> as an argument, which means in our case, that we want to test all elements of a underlying SymbolTable or DBDictionay. So all of them have to be pulled out of the database.

In contrast, Skip(int) takes an integer as an argument, the number of elements that should be skipped. In this case, we don't operate on single elements, we simply change the structure of the collection. Hence, in case of Skip(int), we simply skip some ObjectIds, that's it. No need to get elements from the database.


As an overview, these LINQ methods can be optimized:


Testing/Comparing

  • bool Contains(T)
  • bool SequenceEqual(IEnumerable<T>)

Counting elements

  • int Count()
  • long LongCount()

Getting single elements

  • T ElementAt(int)
  • T ElementAtOrDefault(int)
  • T Last()
  • T LastOrDefault()

Manipulating the structure of the collection

  • IEnumerable<T> Concat(IEnumerable<T>)
  • IEnumerable<T> Distinct()
  • IEnumerable<T> Except(IEnumerable<T>)
  • IEnumerable<T> Intersect(IEnumerable<T>)
  • IEnumerable<TResult> OfType<TResult>()
  • IEnumerable<T> Reverse()
  • IEnumerable<T> Skip(int)
  • IEnumerable<T> Take(int)
  • IEnumerable<T> Union(IEnumerable<T>)

But diving deeper into these optimizations is beyond the scope of today's post and a nice topic for another post.