Saturday, February 14, 2009

Multithreaded Graphics Programming Caveats

You can't make graphics calls in any thread but the main thread without sacrificing platform portability and graphics library portability. In my case I want to be able to run this application on Windows in DirectX 9+ and in OSX and Linux in OpenGL. OpenGL allows you to make graphics calls in a secondary thread if you copy and initialize the graphics context in that other thread, but even if I did that I would have to write separate procedures to initialize the secondary threads for OpenGL and DirectX and none of it would be pretty and from what I've read the gains wouldn't be that significant.

So my compromise is to use POSIX threads and OpenMP for the non-graphic call related stuff - like the heavy lifting math in the terrain generation and normal maps etc. All the graphics related calls will live in the main thread (creating vertex/index buffers and populating them with data, and releasing vertex/index buffers).

I've noticed that creating an Ogre VBO and populating it with data can take up to 700 microseconds (but averages around 165 microseconds), while deleting an Ogre VBO takes 10 times as long (max 8 milliseconds, avg 1 millisecond). By only creating or deleting one VBO per frame, and after moving all the vertex calculations to separate threads, I'm able to keep things pretty smooth. Did I mention I'm using my MacBook Pro? It has an Intel Core II Duo 2.4ghz with 4GB of RAM and the NVidia GeForce 8600M GT with 256MB VRAM.

At some point I will need to identify why freeing the buffer is taking so much more time, and try to optimize that.

Here's the biggest caveat with threads that I've run into: I cannot reliably allocate memory inside a child thread. Ogre uses nedmalloc now and it is a fantastic allocator/deallocator. Unfortunately when I try to allocate memory from within a thread other than the main thread I get an EXC_BAD_ACCESS signal from within nedalloc::GetThreadCache() The annoying thing is that it doesn't happen right away - it's only after the program generates enough meshes. So for now I'm steering clear of any allocation/deallocation inside threads and keeping that in the main thread. It doesn't look like keeping it in the main thread is causing any performance hit anyways.

No comments: