Tuesday, July 14, 2009

Texture Memory: Any abstraction is bad abstraction

My long battle with CUDA textures has finally come to an end. Who won? Well, it was mostly a compromise. You see, things would go very smoothly if I was not trying to make C++ classes out of everything.

To use textures in CUDA, we first define a channel format descriptor (cudaChannelFormatDesc), and tell it how many bits we want in each texture channel (rgba), and what data format is stored in the textures (signed/unsigned ints, or floats)

cudaChannelFormatDesc channelDesc;
channelDesc = cudaCreateChannelDesc(rBits, gBits, bBits, aBits, datatype);


Then we create the cuda array that stores the data. This is where we decide how big our texture is.

cudaArray* cuArray;
cudaMallocArray(&cuArray, &channelDesc, width, height);


Then we copy the texture data from CPU to GPU:

cudaMemcpyToArray(cuArray, 0, 0, cpuPtr, width*height*sizeof(T), cudaMemcpyHostToDevice);


The final step is to create a texture reference, specify the filtering and addressing modes of this texture (clamped / linear filter / normalized etc...) and bind this reference to the memory that we allocated:

texture<float, 2, cudaReadModeElementType> texRef1;
texRef1.addressMode[0] = cudaAddressModeClamp;
texRef1.addressMode[1] = cudaAddressModeClamp;
texRef1.addressMode[2] = cudaAddressModeClamp;

//Modes: cudaFilterModePoint, cudaFilterModeLinear
texRef1.filterMode = cudaFilterModePoint;

//Normalized addressing means 0 to 1 address, else 0 to width and 0 to height address
texRef1.normalized = false;
cudaBindTextureToArray(texRef1, cuArray, channelDesc);


This texture is read inside the CUDA kernel as follows:

float something = tex2D(texRef1, u, v);


This info can be found anywhere. But now is when things get interesting... The CUDA documentation specifies: "A texture reference is declared at file scope as a variable of type texture".

What this really means is "Thou shalt not make local texture references. Thou shalt also not attempt to pass references around as function parameters. Any attempt to make a texture reference at anything other than translation unit scope shalt be dealt with harshly (by a weird ptx compiler error or by a nvcc crash)." More specifically, you will either see an Open64 compiler crash dump, or a slightly less discouraging PTX error: "State space mismatch between instruction and address in instruction 'tex'"

So, it effectively is Impossible to make a "texReference" class, and put everything into it. The main part, the texture reference itself, should be declared outside the class, at global scope in the same file as your kernel. CRAZY!!

The final abstraction I settled for is this:
template <class T>
class texArrayHandle{
 cudaArray* cuArray;
 cudaChannelFormatDesc channelDesc;
 int width, height;
public: ...
};


The texture reference is still declared outside the class, and the class only hides the creation and destruction of the cudaArray. That is around 50 percent abstraction, and 50 percent C. Atleast, my "create array, save to texture, readback texture" example works. Now, to test this in a 'real world' scenario.

8 comments:

  1. Since you are not doing FFT's anymore, it may be a good idea to use pycuda. It makes all this shitty management tasks go away.

    ReplyDelete
  2. Great, I was struggling with texture memory. Thanks for this post.

    ReplyDelete
  3. Actually, pycuda doesn't necessarily make this all go away - I found this page because I was trying to figure out what the weird ptx error was in my pycuda kernel.

    ReplyDelete
  4. @DLepage: Hmmm, never used pycuda. But were you able to fix your problems?

    ReplyDelete
  5. Hey thanks for this post! I struggle to find nvidia documentation but now I understand what normalized addressing means :)

    ReplyDelete
  6. @Matthew:

    Thanks. Actually, if you haven't seen it already, the nvidia documentation for cuda is pretty good:
    http://developer.nvidia.com/object/cuda_2_3_downloads.html

    ReplyDelete
  7. I want create a few textures for different pieces of data. Is it OK to do the following at the global scope?

    vector [texture [float, 2, cudaReadModeElementType] textures];

    I substituted square brackets for angle brackets, because angle brackets are being misinterpreted as html.


    ReplyDelete
  8. A few years ago, i would have said that putting tex references inside an stl container is a bad idea. But cuda 5 claims to have good support for c++.
    So i am guessing this should work as long as it compiles. Why, are you seeing any issues?

    ReplyDelete