Thursday, June 25, 2009

cudaThreadSynchronize

This function is important for anyone who is launching a kernel many times (example: from a for loop). This is because a CUDA kernel launch is asynchronous, and returns immediately. This means that your CPU side for loop will finish in an instant and try to launch everything at once.

Calling cudaThreadSynchronize() will make the CPU wait till all previously launched kernels terminate.

No comments:

Post a Comment