Sunday, August 16, 2009

Eigen2 performance

This is a short post on how average runtime of the raytracer is affected when using Eigen2 library (with compiler options O2, msse, msse2), as against rolling your own simple 3D float vector class. I have been using Vector3f from Eigen2, and have not paid any special attention to speed. I just use vectors wherever I can (positions and colors mostly). I have also not bothered about alignment of structures (obviously Vector4f would align properly, but would considerably increase memory requirements).

I was assuming that eigen2 would not give me a significant speed boost, because raytracing is not a compute heavy problem (mostly memory access bound). It turns out that I was wrong:



There is almost a 2x performance gain when I use Eigen2. Very surprising really, because the amount of work it took me is zero. All I had to do was include the right headers. Maybe when I have more time, I will make a branch of the project with Vector4f and see how alignment affects things.

5 comments:

  1. There's an AlignedVector3 in the eigen/unsupported directory. It aligns automatically, all ops, including cross products are vectorized. But you need to ensure that w component is always zero. It is 0 by default.

    ReplyDelete
  2. "Good" ray tracers heavily depend on using cache efficiently on the CPU more than anything else.

    ReplyDelete
  3. Hmm, does it take up 4 floats of space? or 3?
    Also, what do you mean by 'ensure w is zero'. Should I manually do some translation to make it zero? or did you mean it should be 1? Further, did you imply that Vector3f is not vectorized (adds/muls etc)?

    ReplyDelete
  4. Vector3f is not vectorized. How can it be? But yes, you do get all the magic of expression templates.

    AlignedVector3 takes 4 floats. The w component is 0 by default. Cross3, among other operations, leaves it as 0. Just don't do anything stupid. If you use it to send vertex data to GPU in a vbo, remember to set w =1 in shader before you transform it.

    ReplyDelete
  5. And yes, you should enable sse3 and ssse3 as well. Some expressions can be evaluated faster that way.

    ReplyDelete