This is gonna be a long one, but it will also be highly informative; hold onto your butts!
I’ve created and made available a fully working Monogame examples project over on GitHub as a reference for the topics discussed in this article, you are encouraged to check it out. It is my goal to curate and expand upon this project into the future to help others out and illustrate various issues I’ve run across in the development of Eden.
Along the course of developing Eden and being relatively new to making games and working with graphics, XNA and Monogame I made exclusive use of SpriteBatch; after all the majority of tutorials and information sites that I was able to find, also mostly focused on the use of SpriteBatch for 2D games. The SpriteBatch makes it incredibly easy to get a sprite up on the screen with the ability to do all sorts of interesting things such as animating, rotating, scaling, tinting and even applying shader effects!
Behind the scenes, SpriteBatch isn’t just putting things up on the screen, it’s also ‘batching’ those things in the most efficient way it can (resulting in as few draw calls as possible as those are expensive calls time-wise). So the developer can make many calls to SpriteBatch.Draw() between SpriteBatch.Begin() and SpriteBatch.End() and have a lot of individual sprites drawn in quick succession; this actually works really well on relatively modern hardware; a test with my hardware set was able to make around 50,000 SpriteBatch.Draw() calls PER FRAME without any noticeable loss in performance (held 60FPS with no issues; this was using SpriteSortMode.BackToFront, many more could be sustained if using SpriteSortMode.Deferred to avoid the sorting bottleneck but for various reasons I couldn’t use Deferred), going past that and the system could no longer support 60 FPS consistently and would begin to degrade.
50,000 individual sprites being drawn at the same time and sustaining 60 FPS is pretty outstanding when you think about it! That’s a lot of stuff! One might wonder why you could possibly need more than that. Well, in Eden I am representing a logical 3D world in a 2D fashion a design choice borrowed from Dwarf Fortress which also supports a z-axis. In doing so, Eden is no longer just representing the world via a simple background (terrain) and foreground (characters, items, etc.). Instead, Eden is displaying layers of elevation on top of one another from the bottom of the world up to the elevation the camera is at in order to form the full display of the world. Furthermore the size of each layer varies depending on world size and can become reasonably large. The point I’m trying to make is that while 50,000 graphical elements sounds like a lot, for a game like Eden, it can be maxed out very quickly for one very important feature supported in the game; the ability to zoom the camera.
Eden supports the ability to zoom the display in and out to show less or more of the world; this is useful for quick navigation around the world and to look at the bigger picture at a glance. The caveat to this is that we can potentially be rendering multiple layers of elevation and huge sections of the world at one time; this is where SpriteBatch becomes inefficient and performance dramatically drops to an unplayable <10 FPS. This is made worse by attempting to display the game at full resolution where even more of the world can be displayed on the screen. I made a critical error in exclusively using SpriteBatch to render so many on screen elements, but like so many mistakes I’ve made along the way, I’ve learned a lot as well! 🙂
The solution to this problem is to look elsewhere than the SpriteBatch to render things on the screen. At this point it became very clear that I had to think of a different way to render the world (which encompasses the majority of the draw calls in the game as it contains the most graphical elements). The solution was to dive deeper down into the graphics pipeline and build out, design and use my own vertex buffers.
Vertex Buffers vs SpriteBatch
The SpriteBatch is a very powerful construct within XNA/Monogame, but it also has its limits as explained above. The SpriteBatch is actually rendering a quad, or two triangles (primitive) as a rectangle (or square) on the screen. Even in 2D games these days, we’re still ultimately rendering 3D objects under the hood; in this case a 3D quad positioned in world space and a lot of that is hidden from the developer to make it easy to use. SpriteBatch is outstanding for putting up individual elements on the screen and supporting a variety of features right out of the box; it’s perfect for actors, items, anything that is unique unto itself and changes a lot from frame to frame (for example, creatures in games are usually moving about frame-to-frame, or a ship for example might be rotating a lot, etc.). The key though is uniqueness; the entity might have some very unique display properties to it that separates it from something else and you want to have very granular control over those properties. In this case the use of a SpriteBatch makes perfect sense. Note: Even in smaller games where the amount of objects on screen is relatively low (say a couple thousand), you can still get away with exclusively using the SpriteBatch as you aren’t likely to run into any performance issues if you keep the count of SpriteBatch.Draw() reasonable.
The VertexBuffer is ultimately what SpriteBatch is going to use to describe the quad in a format the GPU understands; namely vertices which make up triangles (primitives) which make up the full object you are trying to render. The VertexBuffer is outstanding for rendering lots of triangles very quickly in the GPU. The amazing 3D worlds of the latest games are making intensive use of vertex buffers to display extremely detailed and varied terrain and character models and pushing and buffering them on the GPU for the slick 30 or 60 frame per second we are used to in our AAA games. Now I don’t claim at all to be an expert in any of this, but I can assuredly say that the use of VertexBuffers is incredibly more efficient than using SpriteBatch when it comes to rendering LOTS of different entities to the screen. The use of VertexBuffers was the solution to the issue in Eden with the terrain elements. Instead of treating each ‘block’ as I call them in Eden as a call to SpriteBatch.Draw() I instead made a VertexBuffer for a layer of terrain and set the information for vertex position in world space, the colors for each vertex and the texture for each vertex. This allowed me to build large arrays of vertex information and set that information for each vertex and send them all to the GPU to be rendered not as individual quads, but one single structure of thousands of triangles in ONE draw call. Instead of making 50,000 SpriteBatch.Draw(), now I instead make use of ONE DrawPrimitives() call. Needless to say, that is an astounding performance gain. It is true that SpriteBatch will batch the draw calls into as few draw calls as it can so it isn’t likely to be 50,000 actual draw calls but it’s going to be significantly more than the a single DrawPrimitives() call simply due to the amount of graphic device changes needed to draw those 50,000 draws; this will vary from game to game of course.
I mentioned I use DrawPrimitives() and the reason is that in so doing, the data is actually sent and stored on the GPU (cached) for the next time it’s needed for drawing; this heavily reduces the amount of data being sent to the GPU each frame; the alternative is to use DrawUserPrimitives(), which causes the data to be sent EACH FRAME and isn’t nearly as performant (still more performant than lots of SpriteBatch.Draw calls however). It’s a little more work to get up and running but the speed savings is substantial.
Side Discussion: Why wouldn’t we always use VertexBuffers if they are so much faster than SpriteBatch? Shouldn’t we render everything using VertexBuffers? The answer lies in that uniqueness of objects in your world that I eluded to above. While you could render all of your characters via a single VertexBuffer and subsequently via a single draw call; the problem is now all of those characters are intrinsically linked together; they will no longer be moving about individually but as a single whole structure which doesn’t make sense for individuals. It’s not to say that this isn’t possible but they are linked together in ways that you probably don’t intend or desire. Additionally, as they are probably moving or changing in other ways frame-to-frame, you’ll be forced to update the vertex information in the buffer for each frame; constantly sending that changed data to the GPU which is going to be a performance bottleneck. This is why objects that don’t vary much, that can safely be linked together and aren’t changing much frame-to-frame (like terrain in my case) it makes a lot more sense to use VertexBuffers but for characters and other objects to use SpriteBatch and treat them as individual quads disconnected from one another. This is NOT to say that you cannot change VertexBuffer information; just that if you do, you shouldn’t be doing it every frame; Eden supports deformation of it’s terrain but that won’t be happening frame to frame but more likely here and there over time; updating the vertex information for the buffer now and then (over seconds or minutes) is completely reasonable. Even still if you do have need for a vertex buffer and it does change from frame-to-frame, you can make use of the DynamicVertexBuffer; but I imagine it makes more sense to use those for modestly sized buffers as that data has to be sent to the GPU each frame and there is a data bandwidth limit between the CPU and GPU of course.
Additional Note: While there are a LOT of interesting and sophisticated ways to optimize rendering and general game performance, this article really isn’t about optimization in general but rather when to use SpriteBatch and when to configure your own VertexBuffer. This can still be taken further through the use of ‘instancing’ on the GPU (rendering LOTS of similar things VERY QUICKLY) or ‘chunking’ on the CPU (breaking up the world into smaller chunks instead of processing the whole thing, like in Minecraft and many other games). I will not be going into these techniques in this article and they are beyond the scope of what I am trying to pass along; just realize, this single article is by no means the be-all-end-all to optimization in any way; that whole topic is very interesting unto itself and knows no bounds 😉
The result of all this work within Eden was an outstanding optimization of rendering performance. In the worst case situation: high resolution (1080×1920) and with the largest world supported and fully zoomed out to display nearly that entire world and at the highest elevation; the frames never dropped below the desired fixed 60 frames per second with a lot of performance to spare. Without these changes and the exclusive use of SpriteBatch.Draw() to draw everything; it was never above about 2-5 frames per second.
The intended take away for this article is to make others aware that SpriteBatch has its place and is very useful but is not always the best choice and certainly isn’t the only choice. For smaller, simpler games it probably IS the best or at least the easiest choice, but for rendering lots of screen entities at once, it is not the way to go. Particle systems for example, will not perform well using SpriteBatch unless they are very small and very limited in scope. The use of custom vertex buffer and index buffers is surely the way to go.
The Quad (Square/Cell/Block/Etc.)
The above image is just one way you might define a quad. You can do it however you like, what will change is the order in which the vertices are defined (clock-wise or counter clock-wise and this will be affected by the back-face culling mode you are using; if you don’t see anything, you may have it backwards and the triangles are being culled since you may be viewing them from behind). The quadrant in which you are drawing will also affect the positioning coordinates, you may even be drawing at the origin and translating to the correct location in world-space; these are all things you will need to determine for your game; some of this is handled in higher level engines for you as well. For the rest of this article I’ll be referencing the above image as defined.
Side Note: Please be aware that there are a variety of other articles out there that will discuss the use of indices; but they almost always talk about the use of indices to reduce the amount of required vertices thereby reducing the amount of data needed to be sent to the GPU; this is true, you can combine vertices 1 and 3 above as well as 2 and 5 into a single vertex; which would reduce the total vertices for the quad above from 6 down to just 4. This savings of two vertices may not seem like much for a single quad, but imagine if you had thousands; that savings adds up substantially, so it is worth considering and does make sense for lots of situations; including terrain (think of your large terrains in World of Warcraft or Fortnight or whatever). For Eden however; I chose to only use make use of DrawPrimitives() and cache the data to the GPU for subsequent calls and not save on the amount of needed vertices for a very important reason. In Eden each ‘block’ of terrain is disconnected from those next to it, even if two are grasslands; I want them to be treated as two different grasslands for purposes of deforming and to reduce complexity in managing the vertex buffer in general; the world in Eden is NOT static like it is in say World of Warcraft (I understand this has changed a little but WoW doesn’t generally support deform-able terrain is my point), so I need this granular control over it. By not sharing vertices across quads; I can continue to treat each quad apart from the other quads and assign them different positions, colors, textures, normals, etc. (anything I’d like to ascribe to each vertex). Just be aware that you CAN combine vertices through the use of indices but you do not have to.
The steps required to draw your own quad to the screen is to define your vertex structure, create a vertex buffer and fill it with your vertex information and optionally create an index buffer and fill it with your index information. Once this is all done, you simply send the data to the GPU and make your call to DrawPrimitives(). As long as your camera is looking at the correct location and you are NOT culling your triangles (due to using the wrong order of vertices to define their draw order), you should see your quad on the screen. This is also going to depend on whether you are using lighting, textures, colors etc.
You can check out my Monogame Examples project out on GitHub that I’ll continue to curate and develop to help newer game developers gain a foothold on XNA & Monogame development. I encourage you to download compile and run the code provided but to also modify and tinker with it to get a better understanding of everything (a decent frame rate counter and camera component are included as well). I personally found diving into vertex and index buffers to be a little daunting at first I was really only familiar and comfortable with SpriteBatch but it turns out it isn’t that bad once you understand how it all works. I’m providing this to help others in a similar situation get up and running a little quicker and hopefully with a lot better initial understanding than I had with respect to the SpriteBatch and VertexBuffer. This project is not meant as a tutorial or anything but rather reference for the Monogame developer who is familiar with the Monogame/XNA world but might be relatively new to development or simply understands the value in reviewing another’s code. If you are in need of some good XNA/Monogame tutorials to get up to speed; I would have to recommend RB Whitaker’s Wiki site; specifically the tutorials for XNA and Monogame.
One final note, this whole post has been about optimizing drawing performance by converting the use of SpriteBatch over to using specifically defined VertexBuffers to gain the advantage of what the GPU is really good at; drawing lots of triangles, very quickly, rather than drawing lots of individual quads less quickly. What I haven’t mentioned up to this point is that for each frame in your game, not only does the game itself need to be rendered but any logic in your game also needs to be performed. So this is going to vary from game to game; since Eden is a heavy simulation game, there is a pretty intense amount of internal logic occurring at any given time and so there is a need to balance the graphical rendering with the logical processing; the more time saved in rendering can then be applied to the simulation logic and calculations. In order to maintain 60 FPS, the draw and logic processing must occur in less than 16.66ms; for a game like Eden 30 FPS isn’t unplayable, it just isn’t nearly as smooth an experience but to maintain that each frame would need to be processed in less than 33.33ms. Many games are unaffected by this as their drawing is relatively simple and their calculations are quite low so this bottleneck to optimize everything to occur within 16.66ms isn’t really much of a concern. Many others ARE affected by this and is why some on consoles and such can only be played at 30 FPS as either the hardware simply cannot sustain 60 FPS or the internal drawing and logic simply cannot be optimized to take less than 16.66ms per frame (I’m thinking Doom here on the Switch which from what I’ve read it was quite an undertaking to get that game to run at a solid 30 FPS on the Switch; I can now appreciate why that is considering the fidelity of that game and the relatively weaker power of the Switch hardware).
Universal Advice: Optimization is a difficult task and should never be attempted without profiling and analyzing code to determine where the bottleneck is. My advice is to never optimize anything until you know there is an issue and what that issue is. In many cases optimization isn’t even required, so don’t worry about it until you notice an issue, then measure, measure, measure!