Optimization – Profiling & Cache Coherence

It’s far more important that a thing works, than for it to work quickly.

Software optimization is a slippery slope and should be embarked upon with an ample supply of caution, focus and above all else, metrics; always metrics.

An established mantra is that you should not optimize anything until you see a problem. The reason for this is that in most cases, optimizing software results in less understandable (and thus maintainable) code which carries its own risks of introducing new bugs as well becoming a source of future bugs due to the additional complexity.

Once you identify that you have a performance problem, your next step should be to profile your code. You need to identify where exactly the bottleneck(s) are. You might be inclined to convince yourself that you know where the issue is by looking at the code. However, you aren’t a computer and you aren’t likely to know where the issue really is, even an educated guess isn’t always correct. Rather than wasting your time optimizing the wrong section of code, I recommend you profile and gather metrics. In this way you will find out exactly where the problem lies and can then attack the issue with surgical precision.

How Do I Profile My Code?
You have a few options, and I’m sure there are others I might not be aware of as well. If possible, get a hold of some profiling software; software built to help analyze running code for how long various methods and sections of code take to process. I recommend making use of the stellar profiling capabilities of Visual Studio 2017 if you are developing in a Windows environment. This is the first Visual Studio I’m aware of where the free Community edition, comes pre-packaged with profiling tools built in! With these tools you can simply run your program and let the profiler analyze and gather the metrics that will help you make an informed decision on what parts of your code you should address.

If you aren’t aware of this great feature, you can access it via Debug > Performance Profiler > start/attach/etc.

If you do not have access to a formal profiling software suite, then you can always resort to a simple stopwatch measuring system where you take the time at the beginning of an operation, method call, process, etc. and then take another time measurement at the end. This requires some educated guessing or broad coverage but it nets you a poor mans version of the same thing. The idea is you want to identify where most of the processing time is occurring and the work to reduce that time to improve performance and throughput of your software.

Optimization takes many forms, a common technique is caching; calculating data that isn’t likely to change often and storing it for quick retrieval the next time you need it. The caveat to caching is that you are then responsible for updating it appropriately when necessary. Sadly I’ve seen too many attempts at caching lead to further problems (sometimes more dangerous than the efficiency gained!) because the cached information wasn’t then updated appropriately where it needed to be. Only implement caching when and where it makes sense as you are going to increase complexity in doing so, sometimes it simply isn’t worth it for a few saved CPU cycles. In other cases, caching is going to be the difference between a stuttering frame-rate and a ‘smooth-as-butter’ consistent frame-rate.

Cache Coherence
Caching is great when implemented properly, but it can actually be enhanced past the gain of caching data. Consider a situation where you have a program that needs to loop through a large amount of data as fast as possible. The CPU has its own concept of caching and will hold onto segments of data in the hopes that nearby data will be processed soon or in succession (as in a contiguous block of data; array). These days we are very familiar with objects which are reference types and with respect to C# and other languages, iterating through a collection of objects and processing them in succession. The thing is, they are reference types and those references are very likely to point to locations in memory all over the place (not contiguous). You can speed this up if you structure ONLY the data you need for processing in a contiguous way, generally speaking I’m talking about the array. It will also be smart to use value types in your array in order to eliminate as much or perhaps all references as well (think of an array of structs). When the CPU grabs information needed for processing it grabs more data than necessary in hopes that the additional data grabbed will be needed next (contiguous data; the array). This is an optimization being done at the CPU and you can and should take advantage of it.

With respect to C#, rather than looping over a List<Type> for example, you’d see increased performance by looping over an array Type[] or better yet, an array of only the data you need rather than a formal class.

Let’s take a tangible look at this.

using System;
using System.Collections.Generic;

Random oRandom = new Random();
int iTotal = 20000000;

List<Player> oPlayersCollection = new List<Player>();
for(int i=0; i< iTotal; i++)
     oPlayersCollection.Add(new Player() { Name = $"Player {i}", Age = oRandom.Next(1, 99) });

Player[] oPlayersInArray = new Player[iTotal];
for (int i = 0; i < iTotal; i++)
     oPlayersInArray[i] = new Player() { Name = $"Player {i}", Age = oRandom.Next(1, 99) };

DateTime oStart = DateTime.Now;
foreach(Player oPlayer in oPlayersCollection)
     // Do nothing
TimeSpan oEndObjects = DateTime.Now.Subtract(oStart);

Console.WriteLine($"Processing {iTotal} Player objects in collection");

oStart = DateTime.Now;
foreach (Player oPlayer in oPlayersInArray)
     // Do nothing
TimeSpan oEndArray = DateTime.Now.Subtract(oStart);

Console.WriteLine($"Processing {iTotal} Player objects in array");

In this example of looping through two identical groups of information held in two very different collections, it is clear that the identical information contained in an array has a performance gain of 100% over the use of a standard generic collection.

This makes sense, an array is collection of contiguous memory locations where as other higher level collections are not likely to be contiguous at all, thus allowing arrays to benefit from the caching that takes place at the CPU level.

Optimization is a topic that cannot be tackled in a single post and it isn’t learned in a single project. To that end I intend to dive further into optimization techniques and discuss some of the bottlenecks I’ve encountered while developing Eden as well as their solutions.


2 thoughts on “Optimization – Profiling & Cache Coherence

  1. > An established mantra is that you should not optimize anything until you see a problem. The reason for this is that in most cases, optimizing software results in less understandable (and thus maintainable) code

    There is also another big issue with optimizing too early, optimizing things that are meaningless. Sure, it’s cool if you have sped up your monster AI tenfold but what’s the point if it only sped everything by a fraction of a percent? (almost) always go for the low hanging fruit and don’t waste time and resources on optimizing code that in the grand scheme of things give no one any benefit :).


    1. I completely agree, unless you REALLY know what you are doing due to past experience, it’s best not to optimize anything until you see or anticipate a problem. Always metrics! Slow code is better than broken code!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s