Wednesday, March 4, 2009

Shared Vertex Buffer vs. Multiple Vertex Buffers

I went ahead and implemented a shared vertex buffer for all the terrain patches and did some profiling and found that updating the shared vertex buffer takes 6 times longer than creating a new buffer and filling it. Also, when using a shared vertex buffer the fps drops from about 240 to 200 - I'm guessing this has to do with using a dynamic versus a static_write_only buffer. I tried setting the shared buffer type to HBU_DYNAMIC and HBU_DYNAMIC_WRITE_ONLY and used HBL_NO_OVERWRITE for my lock type when writing vertex data. I had 16 unique index buffers (for the 16 different index orders) and each patch used one of those 16 buffers so those didn't change - I just had to update the vertex buffer every time I wanted to add a new patch.

So, say for example that my program samples the longest amount of time it takes to create a vertex buffer and fill it - that took on average about 1 ms in my tests, then the time it took to lock part of an existing buffer (the shared buffer) and update it - that took on average 6 ms.

The funny thing about it all is that when NOT using the shared vertex buffer the program seems to "stutter" more - and I haven't nailed down why it does that, but my current theory is some kind of buffer management overhead in Ogre when I created/delete all these buffers, because I've profiled most of my code. When I use the shared buffer the program seems to run smoother with less stuttering, but the fps drops. The stuttering only occurs when flying over the terrain and it is changing LOD often.

My (final?) idea is going to be to go with creating vertex buffers per patch - however, instead of deleting vertex buffers when a patch gets deleted, I'm going to try and recycle that buffer by giving it to another patch to use. My hope is that using HBL_DISCARD on that existing buffer will be faster than creating a new buffer for the new patch and filling it - and I know I will gain however much time it takes to delete a vertex buffer.

Not that the program stuttering is bad - it is MUCH improved since moving the build process to a separate thread, and since throttling the number of buffers created/deleted/filled per frame.


Just did some quick profiling test and can confirm that when using the single shared vertex buffer the time taken by _renderOneFrame() is about 6ms on average - 12ms worst case, and most of that time is reflected in the updating of the shared vertex buffer. However _renderOneFrame() averages 12ms - 80ms worst case when using static buffers and most of that time is NOT reflected in creating/deleting/filling the vertex buffers. Now if Xcode would display Ogre source code I could better nail down where in Ogre the overhead is coming from, but at least now I can see why the app is indeed smoother and less CPU intensive when using the shared buffer - though fps is lowered a bit.

Tradeoffs *sigh*(tm)

Hopefully my shared multiple vertex buffers idea will take care of both issues.

No comments: