Turns out that by simply adding this one line, above one of the for loops that controls ray generation, things can get pretty fast:
#pragma omp parallel for num_threads(8)
I have 4 cores, and I did tests with no threading, 4 threads and 8 threads.
Here are the CPU occupancy charts from system monitor:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6gTvhaPTRTXGhH4i_aizcP0DN3fl1HUjantHS7p6CuzG01Ez0enUkRI5wGQkpsEcwTzsTVL5wjG-2dL09rteTdTTyIhElElV_vNgBcGL81C27ZXexJ_irSZHBQtjXuQbuepQYUTo3200/s400/1thread_cpu.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5pHvi5Gie_8Teb8UH_1NfcYBU5XEnEs3ffPCKtCSjGyxHnINUyKMZN0evJu-oTcO_nADRjtQKHdmUXEleIhaqJlUH56lF8jaHHzn_12o9mkL95rIgQZWAsOkxhBpQiY34xErXOqlSv2o/s400/4thread_cpu.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqPWSamnulagLPP3xwKjDPnq4mIMfxWFPT6497-nbippKTzaInd2bqeh6GdGmx9bw5TkD28nz6Ki0giSAAaoy4f5bHMlPRKaBUbrrhEK4_OBlx2DZ_7dqkDdNvO1zBnyuf8FCGmC1ZfwM/s400/8thread_cpu.png)
Performance:
1 Thread : 78 seconds/frame
4 Threads: 35 seconds/frame
8 Threads: 21 seconds/frame
Of course, the "clock()" function completely messes up in multi-threaded applications, and shows the same time (78 seconds) no matter what the actual time taken is. I measured these off a wall clock.
There seems to be some contention, due to which the resultant images for single threaded render and multi-threaded renders are different (by a very very small margin). The diff image has to be enhanced massively to see this. 4 pixels are off by 25%. Many pixels have a difference of 1 color value (out of 255). The images are otherwise undistinguishable to the eye. Need to fix this issue, but I will be taking a break from this project for a while, so updates may come after around 20 days.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIq5VSavhxO3CbKaA_EA0d-HpI_-INL2YJpHqEqP2XVSfB-Uns8_paJ48248gX_NOvndHz1-hetWUqDHUwCU2xlmfzGIwJ7g92uY0HyU5HMbjHA9rL0oXC3JqTEswGQrPRhDv8wysnPVo/s320/singleThread.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigPmNsByNuOwQTf-87RbbvR2HmJuVVrRSpcYRxSQuOYP6M9-KLDgl5cotXTBcv_3Ol1TL1OHS2PA0PUr21AzqWOXbYtE_evxMu0iP-VqSIU6B3eucDwVtQkiWPOEhYqmVlq3cwzjLMFbY/s320/8thread.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-YyHNnLHxNyuDxfU3fBnYBu_BU9hnrg073kTyRWv7OvCApsqyo7f-LCcgYvRwSgjCoCG9dA5ZbwNv45ZeLBdPKW72YBrgu1A3OGq8dyxhDKD6xe_woX-Vac3WCUMDuOaLZYuY9y8KUsI/s320/diff.png)
It seems odd that you need more threads than cores to saturate your cpu. But it is not unheard of. I once read that a gentoo dev had to do make -j256 to saturate his quad core. :)
ReplyDelete