I was working on JFramework (Link), updating the OSX edition to fully utilize shaders, because A Sound Plan is nearing Beta status, when I noticed a most interesting bug.
Using shaders ran the game at 100% CPU usage.
I was completely baffled, to say the least, as the build ran at about 4-8% CPU usage on the Linux AND Windows builds of the engine.
Part of me yelled that I should have taken care of this sooner, as both the Linux and Windows builds of the engine had been using shaders for over a year. Another part of me yelled for not just taking care of the OSX build at the time when I worked on the Linux and Windows builds. Another part of me yelled for being all "hindsight is 20/20".
After the yelling session was complete, I had to buckle down and just get this thing out.
Rather than tell you every step I took to solve this issue, I'll just give you the highlight reel of what makes the OSX driver so strange.
- OSX differentiates between its 2.1 and 3.0 cores in their driver, you must specify which core you are using when you create a new GL context. The default is 2.1. There is very little compatibility between the two cores, i.e. you cannot use immediate mode in 3.0. Example provided below on how to activate the 3.0 context using SDL2. Note: I'm using GLEW to assist in loading and linking OpenGL functionality, the glewExperimental flag allows OSX to use core functions without the EXT or APPLE extensions.
How to set up the core profile on OSX. - I implemented a batching algorithm to draw similar objects together, since the algorithm is so long I'll summarize it:
- Start with a list of objects to draw.
- For each object that shares a texture id and program id:
- Add vertices, texture coordinates, vertex colors, etc. to separate containers.
- Upload all data in containers using glBindBuffer and glBufferData (i.e. bind a buffer location and upload data to GPU)
- Make draw call, in my case I call glDrawElements using GL_TRIANGLES.
- Iterate until all objects are drawn.
While this improved the speed of my drawing, it didn't fix the Mac build, but it's nice to have. - Calling glActiveTexture using proprietary drivers on Linux and Windows using different ids is perfectly acceptable depending on hardware, the driver's patching abilities are pretty slick. The OSX driver, however, is HORRIBLE at patching itself (in its current state), forcing a recompile of the shader as it can't effectively sample a texture from any slot. Basically, call glActiveTexture(GL_TEXTURE0) unless otherwise required.
- If your shaders are running slow the moment your shader calls texture (i.e. sampling), this is probably your issue.
- The OSX OpenGL driver is peculiar in how it handles shader attributes. If you leave any one of the attribute fields unused but declare the location, the shader may run in software (i.e. CPU) mode.
- Use Instruments on OSX whenever possible, use the Time Profiler module, use it. If you notice calls to SCCompileShader, that means that something in your shader is forcing a recompile, spinning up a lot of CPU.
- Write functions that pretty print OpenGL errors, just do it, it's massively helpful to know that you're adhering to the OpenGL spec. Remember to sprinkle the pretty print call wherever you can, so that you know when you've broken something. Turn on debug flags so that your release build doesn't print, you don't need that.
Use Instruments, just do it.
OSX is known for having a pretty poor OpenGL driver, if you're having issues with performance, hopefully this short post will be of assistance.
If you want more hints, the docs here: Shader help, Texture help will help you.
Thanks,
Jimmy