Sunday, July 12, 2009

Compiling CUDA projects

I find that a lot of new CUDA developers have this tendency to use the CUDA sdk makefiles (include common.mk), and having loads of dependencies on the cuda utils that the sdk provides. This, in the long run does not seem like a good idea, as it involves depending on what the sdk makefile does, and hence depending on the (heavy) sdk itself. Here is a small tutorial on how to compile stand alone CUDA program with multiple h, cpp and cu files, and a few external headers/libs:

Lets say we have 4 files: main.cpp, test.cpp, test.h, kernel.cu

The idea is that we have to compile the cpp files with g++, and the cu files with nvcc.
Also, we specify that only compilation should take place (no linking), so that if we have any dependencies between files, we won't get zillions of errors.

To do this, compile the cpp files this way:

g++ -c *.cpp

Then compile the cu files:

nvcc -c *.cu

Note that nvcc may not be in your default path. When you install the CUDA toolkit, it is placed in the cuda/bin/nvcc path. For example if you installed cuda toolkit in /usr/local/cuda, then nvcc is in /usr/local/cuda/bin/nvcc. Adding this to your $PATH variable will fix things.

The above steps should produce 3 object files: main.o, test.o, and kernel.o

Finally, link these together and make the final executable:

g++ -o runme *.o

This will link all the object files into a single executable called "runme", which you can run as usual.

To include any specific headers / libraries, simply pass them as arguments during the compile / link phases. For example:

g++ -c -I/usr/local/cuda/include *.cpp
g++ -o runme -L/usr/local/cuda/lib -lcuda -lcudart *.o

While running CUDA programs, you may get a library not found error. To fix this, add the CUDA libraries to your $LD_LIBRARY_PATH environment variable. The libraries are libcuda.so, libcudart.so and will be present in your CUDA toolkit install path.

All this can be condensed into a nice makefile for convenience.

Finally, the part about actually writing CUDA programs that are independent of the SDK. Note that the SDK uses macros like CUT_DEVICE_INIT() which are in practice not needed at all. Simply include the cuda header files, and start making cuda calls.

33 comments:

  1. Hi:
    I tried to follow your idea, but no success :(
    I have a large program with .cpp and .h distributed across different folders, and inside a class defined in a .h file, I placed a function, that function is declared in a .cu file.
    I obtained the .o files from the .cpp files with no problem.
    I obtained the .o files from the .cu fiiles with no problems.
    However, when linking, all hell breaks hole. The number of errors is bigger than what a screen can contain. It seems that it does not recognize the cutil functions and worse: it says that many of my classes definitions are wrong! (which are not since my code works perfectly when run on a CPU).

    So, please, if you have faced this problem, I would be glad if you can share your experiences.

    ReplyDelete
  2. Hi Daniel,

    You can make sure CUDA (tries to) use C++ compilation by passing the "--host-compilation=c++" flag to nvcc, if you are using an older version of the cuda toolkit (new versions do this by default).

    Also, I wonder why you are using cutil. Anyway, are you specifying the paths for cutil libraries while linking? I see a whole bunch of libs in my SDK install path: "NVIDIA_GPU_Computing_SDK/shared/lib/linux". You may want to include them in your linking process like this:

    g++ -L (cuda-sdk-path)/shared/lib/linux/ -L/usr/local/cuda/lib -lcuda -lcudart *.o

    (of course, you should replace (cuda-sdk-path) with the actual path to your cuda sdk)

    Further, to make sure any functions you define in the '.cu' file are visible to the cpp files, you need to declare them as extern "C". Here is an example:

    extern "C" void generate_photon_map(float4* buffer)
    {
    ....
    }

    Hope this helps,
    kashyap

    ReplyDelete
  3. Thanks for your quick reply. But it doesn't work. I got same errors. (aside from lcutil, which I solved thanks to you)
    What I did is to take the cppIntegration project, and tried to apply your steps. It DOES NOT work!

    It shows problems with 'multiple definitions' of 'kernel' and 'kernel2' which are CUDA functions kernels defines with __host.

    So, please, if you can replicate the same experiment (it should take less than 5 minutes) and let me know what happens, I would be glad.

    Daniel.

    ReplyDelete
  4. hmm, I tried this, and my makefile looks something like this:

    =================================

    SDKPATH := ~/NVIDIA_GPU_Computing_SDK
    CUDAPATH := /usr/local/cuda

    LDFLAGS := -L$(CUDAPATH)/lib -L$(SDKPATH)/shared/lib/linux -L$(SDKPATH)/C/lib -lcutil -lcudpp -lcuda -lcudart
    CXFLAGS := -I$(CUDAPATH)/include -I$(SDKPATH)/shared/inc -I$(SDKPATH)/C/common/inc

    CXX := g++
    NVCC := $(CUDAPATH)/bin/nvcc

    $(EXECUTABLE): cppIntegration.o main.o cppIntegration_gold.o
    $(CXX) $(LDFLAGS) -o $(EXECUTABLE) cppIntegration.o main.o cppIntegration_gold.o

    cppIntegration.o: cppIntegration.cu
    $(NVCC) $(CXFLAGS) -c cppIntegration.cu

    main.o: main.cpp
    $(CXX) $(CXFLAGS) -c main.cpp

    cppIntegration_gold.o: cppIntegration_gold.cpp
    $(CXX) $(CXFLAGS) -c cppIntegration_gold.cpp


    =====================================
    Does this work for you?

    ReplyDelete
  5. oops, add this line to the top:

    EXECUTABLE := cppIntegration

    ReplyDelete
  6. Thanks a lot!
    Actually, I discovered a mistake: I was creating the .o file for the file cppIntegration_kernel. According to some forums, that was wrong, since the functions inside are created twice and then the linker gets confused.

    I see that in your code you didn't create that .o file, that's why it works (I tested it!)

    Now, back to my large problem:
    Well, the problem with the linker still exist. The number of errors are far fewer though:
    I get errors like:

    "multiple definition of "
    The thing is that they come from the .o file of my cpp files, created with g++ (not with nvcc)
    I believe the compiler (g++) is declaring the functions twice and then the linker gets confused. But I don't know how to correct that. If you ever faced this problem, just let me know.

    Thanks for your help,

    Daniel.

    ReplyDelete
  7. Ok, I think the reason you get multiple definition errors is that you are compiling the same stuff twice. Here is an example:

    lets say we have 2 files: A, and B
    Inside B, we have #include "A"

    If you compile both A and B, you will now get an error because the stuff in A gets compiled twice.

    This is what was happening with the _kernel file. It was included inside cppIntegration.cu (I think)

    The way to fix this is:
    1. Enclose your header files inside #ifdef ... #endif macros, so they are not included twice:

    example header file (thisheader.h):

    #ifndef THISHEADER_H
    #define THISHEADER_H

    void blahblah();

    #endif

    2. Do not include variable declarations and function Definitions in include files. Put only function prototypes in include files, and put the definitions in cpp files.

    Of course, you may already know this, but I was just stating my experiences with linker problems...

    ReplyDelete
  8. I did apply your suggestion (1), but same errors. Your suggestion (2) is the one that I finally applied, but with a difference:
    I didn't modify the header files, in which there are multiple variable and class definitions. What I did is modifying my .cu files so that they don't deal with my custom types/classes. And now it works.

    However, what bugs me is that it seems that I CAN NOT use custom types/classes defined for the cpp files in my .cu files. Is that true? Or is there a way around?

    Thanks for your responses.
    Daniel.

    ReplyDelete
  9. hi, its a bit hard doing this over the comments section :P

    can you send me the code so I can try it?
    sriramkashyap AT_THE_RATE_GMAIL_DOT_COM
    sorry, but i dont like spam bots very much :)

    Even otherwise, classes can be used to a limited extent in cu files... you can try specifying the nvcc option "--host-compilation=c++".

    Using C++ classes in cu file is not a problem as long as your device and kernel functions dont have to deal with them. even otherwise, i have been able to use templates and operator overloading in device functions.

    ReplyDelete
  10. Hi Kashyap,

    Can you send me your running code sample. This is very important for me.

    abeynk[AT]gmail[DOT]com

    ReplyDelete
  11. hello, thanks to your post, is very useful..
    anyway, i'm pretty new with cuda coding, and i'm just trying to understand which libraries are useful and how the code flow works.
    Do u mind if i ask u to send me a simple project with a few source files and the makefile? i want to try to compile and link it with your method, and see how to create the makefile.

    I'm on a university project and it's very important for me.

    gianpiero.gibiino[AT]gmail[DOT]com

    ReplyDelete
  12. Hi,

    I am beginner in CUDA and this was great help.

    In my case I had to use the 64 bit version of the libraries, though.

    Thank you for your entry blog

    ReplyDelete
  13. Thanks for the Makefile Kashyap. But I had to make a small correction before it worked for me.

    I had to move the "-lcutil -lcudapp -lcuda -lcudart" part to the linking command:

    $(CXX) $(LDFLAGS) -o $(EXECUTABLE) cppIntegration.o main.o cppIntegration_gold.o -lcutil -lcudapp -lcuda -lcudart

    ReplyDelete
  14. Odd, $(LDFLAGS) should take care of that. It contains all the libraries...

    ReplyDelete
  15. I tried this with three files: Main.cpp KernelWrapper.cu and MyKernel.cu

    When I create objects from the cu files with nvcc like this:

    nvcc -c KernelWrapper.cu MyKernel.cu

    I get the two object files KernelWrapper.obj and MyKernel.obj.
    I I use nvcc to to do the final linking like this:

    nvcc -o runme Main.cpp KernelWrapper.obj MyKernel.obj,

    it works fine!

    But if I use g++ for the last step it gives me pages of errors including many strange looking "undefined reference" errors.
    I hoping it will be possible to use g++ in the final step, because Main.cpp in my little test is a stand-in for a lot more c++ code (which uses STL).
    nvcc is not so good with STL I read somewhere, and besides that, I have a Makefile for the existing code which I would like to reuse with as few changes as possible.

    My hope was, that I good compile all cuda code with nvcc and then pass the objects to g++. Should'nt this be possible?
    I'm using Win7 and minGW.

    The Main.cpp file is very simple - it calls a function in the kernelwrapper and then sends text to cout.

    ReplyDelete
  16. nvcc expects only one cu file. All other cu files should be #included into that one.
    So in the end, you should compile only the main cu file using nvcc. Also, try to do all compiles (-c) first, even with g++, and for final linking step, pass only object files to g++. So here is what you do (assuming KernelWrapper.cu is your main cu file):

    nvcc -c KernelWrapper.cu
    g++ -c Main.cpp
    g++ -o runme Main.o KernelWrapper.obj

    hope this works.

    ReplyDelete
  17. Hi and thanks for the answer.

    I still have problems though.

    I compile the cu file. I have now moved the kernel method into this file, so MyKernel.cu is no longer needed.

    nvcc -c KernelWrapper.cu

    I then compile the Main.cpp with g++.

    but when I get to the linking I get a lot of undefined references.

    I try to link like this:
    g++ -o runme Main.o KernelWrapper.obj -L"D:\CUDA\v3.2\lib\Win32" -lcuda -lcudart

    but I get a whole bunch of undefined reference errors!

    A very few of them are:

    KernelWrapper.obj:(.text+0x32): undefined reference to `?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A'

    KernelWrapper.obj:(.text[??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@H@Z]+0x19): undefined reference to `__security_cookie'

    KernelWrapper.obj:(.text[?_Decref@facet@locale@std@@QAEPAV123@XZ]+0xf): undefined reference to `??0_Lockit@std@@QAE@H@Z'

    KernelWrapper.obj:(.text[?clear@ios_base@std@@QAEXH_N@Z]+0x63): undefined reference to `_CxxThrowException@8'

    I think there are almost 100 of these errors.
    Also one for Main.cpp:

    Main.o:Main.cpp:(.text+0xf): undefined reference to `RunTest()'

    I include cuda_runtime.h in KernelWrapper.cu and as far as I can see it looks like it can find the cuda and cudart objectfiles just fine.

    Do you have a suggestion about what could be wrong here?

    ReplyDelete
  18. Looks like some basic libraries are not being used while linking, so your standard c++ stuff (like cout) are not being resolved.

    Why this is happening, I don't know. But the solution should be to explicitly link with stdc++ library:

    g++ -lstdc++ -o runme Main.o KernelWrapper.obj

    ReplyDelete
  19. (If the above hack does not fix things, read below)

    By the way, I just noticed the errors are in KernelWrapper.obj... I hope you are not using (much) c++ in the .cu file... nvcc is not too good with c++ stuff (well, it was not good 6 months ago, dunno how it is now).

    If all you want to do is print stuff in the .cu file, try using printf. Maybe somebody else knows a better way of doing this (some googling may help).

    ReplyDelete
  20. Thanks you very very much :-)

    I got it working by using printf's in KernelWrapper.cu instead of cout's.

    I get a few warnings, but know I'm less worried :-)

    ReplyDelete
  21. Hi, I am having the same problems, as Jens above: Under windows, I would like to create a cuda library using nvcc, and link it together with g++-made objects from cpp. I am getting similar errors like him. And I just cant get to work my code.

    Jens, if you read this, please send me the solution, I would really appreciate a little example program, because I can't find the solution since days ago.
    kottalovag[AT]gmail[DOT]com

    ReplyDelete
  22. @KottaLovag: Did both the above solutions not help you? (Using printf instead of cout, and explictly linking with libstdc++) ?
    Maybe you can share a few lines of the errors you see.

    ReplyDelete
  23. Thank you for the quick answer!

    I created a minimal representation of my problem and uploaded here: http://www.megafileupload.com/en/file/316921/cuda-problem-example-zip.html

    I have win7 x64, cuda, mingw installed
    I have visual studio as well, but i dont want to use that, because I am planning to do this with Qt later.

    As you can see in the uploaded example (or below), I would like to create a statical library for the cuda-part of my application. This can be done with the MakeStatlib.bat, which's first call is to setup the environment for the visual compiler (I read this in the nvcc manual)
    Then, I would like to create a runnable app, which would call the call() function which takes place in cudalib.lib

    The content of the uploaded example:
    files:
    kernel.cu
    caller.cu
    kernel.h
    caller.h
    main.c
    MakeStatlib
    MakeStatlib.bat
    MakeRunnable
    MakeRunnable.bat

    in kernel.cu:
    __global__ void calculateKernel()
    {

    }

    in caller.cu:
    #include /* dim3 */
    #include "kernel.h" /* calculateKernel() */

    void call()
    {
    dim3 dimBlock(1,1);
    dim3 dimGrid(1,1);
    calculateKernel<<< dimGrid, dimBlock >>>();
    }

    in kernel.h:
    #ifndef KERNEL_H
    #define KERNEL_H

    __global__ void calculateKernel();

    #endif /* KERNEL_H */

    in caller.h:
    #ifndef CALLER_H
    #define CALLER_H

    void call();

    #endif /* CALLER_H */

    in main.c:
    #include "caller.h" /* call() */

    int main()
    {
    call();
    return 0;
    }

    in MakeStatlib:
    cudalib.lib: kernel.obj caller.obj
    nvcc -lib kernel.obj caller.obj -o cudalib.lib
    kernel.obj: kernel.cu kernel.h
    nvcc -c kernel.cu -o kernel.obj
    caller.obj: caller.cu caller.h kernel.h
    nvcc -c caller.cu -o caller.obj

    in MakeStatlib.bat:
    call "c:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\Tools\vsvars32.bat"
    mingw32-make -f MakeStatlib

    in MakeRunnable:
    app: main.o cudalib.lib
    gcc main.o cudalib.lib -o app
    main.o: main.c caller.h
    gcc -c main.c -o main.o
    cudalib.lib: kernel.cu caller.h caller.cu
    MakeStatlib.bat

    in MakeRunnable.bat:
    mingw32-make -f MakeRunnable
    PAUSE

    ReplyDelete
  24. I forgot to mention the problem:
    I can successfully create cudalib.lib, but unfortunately cannot link it together with main.o, the linker says:
    gcc main.o cudalib.lib -o app
    main.o:main.c:(.text+0xc): undefined reference to 'call'
    collect2: ld returned 1 exit status

    The other thing to mention is that the blog hides "<"cuda">" when including it, thinking of it as a tag :)

    ReplyDelete
  25. Sorry I can't actually try out your code, since I don't have the setup necessary to run this. Also, my experience in this matter is limited to the linux platform. But since this seems to be a regular linking issue, have you tried using something like this to link:

    gcc main.o -static -L. -lcudalib.lib -o app

    You can also add the '-v' argument to gcc to see a verbose dump that may reveal the problem.

    ReplyDelete
  26. @Kashyap:
    Thank you for the advice, I tried it.
    for
    -L. cudalib
    -L. -lcudalib.lib
    -L. -llibcudalib
    -L. -llibcudalib.lib
    -lcudalib
    -lcudalib.lib
    -llibcudalib
    -llibcudalib.lib

    it says it does not find

    for
    -L. cudalib.lib
    -L. -lcudalib

    it says undefined reference to 'call'

    verbose:

    D:\-=OWN=-\-=PROJECTS=-\CUDA\Tutoring\examples\cuda_example_01_(1krncu_1callercu
    _1cudah_1mainc)_libformat>mingw32-make -f MakeRunnable
    gcc -v main.o -L. -lcudalib -o app
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/4.5.2/lto-wrapper.exe
    Target: mingw32
    Configured with: ../gcc-4.5.2/configure --enable-languages=c,c++,ada,fortran,obj
    c,obj-c++ --disable-sjlj-exceptions --with-dwarf2 --enable-shared --enable-libgo
    mp --disable-win32-registry --enable-libstdcxx-debug --enable-version-specific-r
    untime-libs --disable-werror --build=mingw32 --prefix=/mingw
    Thread model: win32
    gcc version 4.5.2 (GCC)
    COMPILER_PATH=c:/mingw/bin/../libexec/gcc/mingw32/4.5.2/;c:/mingw/bin/../libexec
    /gcc/;c:/mingw/bin/../lib/gcc/mingw32/4.5.2/../../../../mingw32/bin/
    LIBRARY_PATH=c:/mingw/bin/../lib/gcc/mingw32/4.5.2/;c:/mingw/bin/../lib/gcc/;c:/
    mingw/bin/../lib/gcc/mingw32/4.5.2/../../../../mingw32/lib/;c:/mingw/bin/../lib/
    gcc/mingw32/4.5.2/../../../
    COLLECT_GCC_OPTIONS='-v' '-L.' '-o' 'app.exe' '-mtune=i386' '-march=i386'
    c:/mingw/bin/../libexec/gcc/mingw32/4.5.2/collect2.exe -Bdynamic -o app.exe c:/
    mingw/bin/../lib/gcc/mingw32/4.5.2/../../../crt2.o c:/mingw/bin/../lib/gcc/mingw
    32/4.5.2/crtbegin.o -L. -Lc:/mingw/bin/../lib/gcc/mingw32/4.5.2 -Lc:/mingw/bin/.
    ./lib/gcc -Lc:/mingw/bin/../lib/gcc/mingw32/4.5.2/../../../../mingw32/lib -Lc:/m
    ingw/bin/../lib/gcc/mingw32/4.5.2/../../.. main.o -lcudalib -lmingw32 -lgcc_eh -
    lgcc -lmoldname -lmingwex -lmsvcrt -luser32 -lkernel32 -ladvapi32 -lshell32 -lmi
    ngw32 -lgcc_eh -lgcc -lmoldname -lmingwex -lmsvcrt c:/mingw/bin/../lib/gcc/mingw
    32/4.5.2/crtend.o
    main.o:main.c:(.text+0xc): undefined reference to `call'
    collect2: ld returned 1 exit status
    mingw32-make: *** [app] Error 1

    D:\-=OWN=-\-=PROJECTS=-\CUDA\Tutoring\examples\cuda_example_01_(1krncu_1callercu
    _1cudah_1mainc)_libformat>PAUSE
    Press any key to continue . . .

    ReplyDelete
  27. Hmm, that did not go very well... You should try -static. And there is no name conflict right? I.e. no other lib named cudalib. You can probably try a static lib tutorial that does not involve cuda first... Just to make sure nvcc is not doing anything funny here, and also to confirm the gcc params. Other than that, not sure how to fix this.

    ReplyDelete
  28. I went through these tutorials and descriptions and created examples for me without cuda and everything worked well:
    http://www.adp-gmbh.ch/cpp/gcc/create_lib.html
    http://www.codeproject.com/KB/cpp/libraries.aspx
    http://www.mingw.org/wiki/Specify_the_libraries_for_the_linker_to_use

    I also checked, there is no conflict. :(
    Anyway, thank you for trying to help me.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Hi Friend I create mainServer.cpp_o and PIRMatrixMul.cu_o but problem is that when I execute the make file of server I get error of /usr/bin/ld:cutil.h cannot find please tell me the solution I m going to paste my make file. And when M trying to run my make file (not considering two object files) I get an error of CUT_INIT_DEVICE requires two arguments

      make file is

      Delete
  29. Creating server Makefile
    Executing 'make' for the server...
    /usr/bin/ld: 1: /bin: Permission denied
    /usr/bin/ld: 2: Makefile: not found
    /usr/bin/ld: 3: Makefile: not found
    /usr/bin/ld: 4: Syntax error: "(" unexpected
    collect2: ld returned 2 exit status
    make: *** [PIRServer] Error 1
    how to resolve this query?

    ReplyDelete
  30. hi freinds,
    I have file called mulshare.cu and mulshare.h.
    In mulshare.cu I have used arc4random function.while compiling with nvcc compiler I am getting error:
    error: identifier "arc4random" is undefined.

    here is the part of code.

    int main(int argc, char* argv[])
    {
    Matrix A,B,C;
    int a1,a2,b1,b2;
    a1 = atoi(argv[1]); /* height of A */
    a2 = atoi(argv[2]); /* width of A */
    b1 = a2; /* height of B */
    b2 = atoi(argv[3]); /* Width of B */

    A.height = a1;
    A.width = a2;
    A.elements = (float*)malloc(A.width * A.height * sizeof(float));

    B.height = b1;
    B.width = b2;
    B.elements = (float*)malloc(B.width * B.height * sizeof(float));

    C.height = A.height;
    C.width = B.width;
    C.elements = (float*)malloc(C.width * C.height * sizeof(float));


    for(int i = 0; i < A.height; i++)
    for(int j = 0; j < A.width; j++)
    A.elements[i*A.width + j] = (arc4random() % 3);

    for(int i = 0; i < B.height; i++)
    for(int j = 0; j < B.width; j++)
    B.elements[i*B.width + j] = (arc4random() % 2);
    MatMul(A,B,C);


    for(int i=0; i < min(10,A.height);i++)
    {
    for(int j=0; j < min(10,A.width);j++)
    printf("%f",A.elements[i*A.width + j]);
    printf("\n");
    }
    printf("\n");

    for(int i=0; i < min(10,B.height);i++)
    {
    for(int j=0; j < min(10,B.width);j++)
    printf("%f",B.elements[i*B.width + j]);
    printf("\n");
    }
    printf("\n");

    for(int i=0; i < min(10,C.height);i++)
    {
    for(int j=0; j < min(10,C.width);j++)
    printf("%f",C.elements[i*C.width + j]);
    printf("\n");
    }
    printf("\n");

    }
    I am trying to compile with nvcc as well as gcc.but still error is coming.
    I am using nvcc mulshare.cu.

    Please,help me to get out of here.

    ReplyDelete
  31. arc4random does not seem to be a standard C/C++ function.
    Are you including any external libraries?
    The error you are getting points to not including the correct headers.
    You will need to include whatever headers are needed for arc4random, and tell nvcc where these headers and libs are (-I and -L)

    If arc4random works separately in a regular c compiler (gcc) you could try to split your files into one cpp file and one cu file.
    Put all external function calls like arc4random in the cpp file, and compile the cpp file directly using gcc (don't link. use gcc -c)
    Then compile the cu file using nvcc and link the output binaries as described in my original post.

    ReplyDelete