Saturday, August 29, 2009

A dash of OpenMP goodness

I work on a dual-core dual-socket 1.8 Ghz Opteron at the lab, and was wondering how much performance gain I would get out of a simple OpenMP 'parallel for' invocation...

Turns out that by simply adding this one line, above one of the for loops that controls ray generation, things can get pretty fast:

#pragma omp parallel for num_threads(8)

I have 4 cores, and I did tests with no threading, 4 threads and 8 threads.

Here are the CPU occupancy charts from system monitor:


Single Thread



Four Threads



Eight Threads


Performance:
1 Thread : 78 seconds/frame
4 Threads: 35 seconds/frame
8 Threads: 21 seconds/frame

Of course, the "clock()" function completely messes up in multi-threaded applications, and shows the same time (78 seconds) no matter what the actual time taken is. I measured these off a wall clock.

There seems to be some contention, due to which the resultant images for single threaded render and multi-threaded renders are different (by a very very small margin). The diff image has to be enhanced massively to see this. 4 pixels are off by 25%. Many pixels have a difference of 1 color value (out of 255). The images are otherwise undistinguishable to the eye. Need to fix this issue, but I will be taking a break from this project for a while, so updates may come after around 20 days.


Single Threaded Render



8 Thread Render



25 x Diff image

1 comment:

  1. It seems odd that you need more threads than cores to saturate your cpu. But it is not unheard of. I once read that a gentoo dev had to do make -j256 to saturate his quad core. :)

    ReplyDelete