暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Cuda编程(三):“Hello world”

Qiang的杂谈 2019-11-22
619

前两篇文章介绍了Cuda的安装环境和GPU的体系架构,本篇文章将从实际编程角度去阐述什么是kernel,以及如何写一个自己的kernel。上一章我们介绍过kernel的前缀有__host__,__device__,__global__,他们是标识函数是运行在GPU还是CPU内部,GPU内部有自己的寄存器、缓存以及内存,Cuda计算的结果需要从GPU的内存中拷贝回CPU可见的地址空间中,也就是host和device间的通信,一个Cuda程序的大致流程如下[1]:

  • 分配host内存,并进行数据初始化

  • 分配device内存,并从host将数据拷贝到device上

  • 调用Cuda的kernel函数在device上完成指定的运算

  • 将device上的运算结果拷贝到host上

  • 释放device和host上分配的内存

Cuda软件编程模型如图1所示:

图1 CUDA软件编程模型[2]


代码清单:CUDA版本的“Hello world”:

    #include <stdio.h>
    #include <cuda.h>


    __global__ void hello()
    {
            int myID = (blockIdx.z * gridDim.x * gridDim.y + \
                    blockIdx.y *  gridDim.x + blockIdx.x) * \
                    blockDim.x * blockDim.y * blockDim.z + \
                    threadIdx.z * blockDim.x * blockDim.y + 
                    threadIdx.y * blockDim.x + \
                    threadIdx.x;
            printf("Hello world from %i!!\n", myID);
    }


    int main()
    {
            dim3 grid(3, 2);
            dim3 block(4, 4);
            hello<<<grid, block>>>();
            cudaDeviceSynchronize();
            return 0;
    }


    每个线程可认为是六维数组的一个元素(关于Dim3,Grid,Block等概念请参考Cuda编程(二)):

    Thread t[gridDim.z][gridDim.y][gridDim.x][blockDim.y][blockDim.x]

    编译命令:nvcc hello_world.cu -o hello_world

    输出结果可以看到thread id为[0, 95]:

      [qiangnix@A01-R21-I137-130-1005217 ~]$ ./hello_world 
      Hello world from 64!!
      Hello world from 65!!
      Hello world from 66!!
      Hello world from 67!!
      Hello world from 68!!
      Hello world from 69!!
      Hello world from 70!!
      Hello world from 71!!
      Hello world from 72!!
      Hello world from 73!!
      Hello world from 74!!
      Hello world from 75!!
      Hello world from 76!!
      Hello world from 77!!
      Hello world from 78!!
      Hello world from 79!!
      Hello world from 48!!
      Hello world from 49!!
      Hello world from 50!!
      Hello world from 51!!
      Hello world from 52!!
      Hello world from 53!!
      Hello world from 54!!
      Hello world from 55!!
      Hello world from 56!!
      Hello world from 57!!
      Hello world from 58!!
      Hello world from 59!!
      Hello world from 60!!
      Hello world from 61!!
      Hello world from 62!!
      Hello world from 63!!
      Hello world from 80!!
      Hello world from 81!!
      Hello world from 82!!
      Hello world from 83!!
      Hello world from 84!!
      Hello world from 85!!
      Hello world from 86!!
      Hello world from 87!!
      Hello world from 88!!
      Hello world from 89!!
      Hello world from 90!!
      Hello world from 91!!
      Hello world from 92!!
      Hello world from 93!!
      Hello world from 94!!
      Hello world from 95!!
      Hello world from 0!!
      Hello world from 1!!
      Hello world from 2!!
      Hello world from 3!!
      Hello world from 4!!
      Hello world from 5!!
      Hello world from 6!!
      Hello world from 7!!
      Hello world from 8!!
      Hello world from 9!!
      Hello world from 10!!
      Hello world from 11!!
      Hello world from 12!!
      Hello world from 13!!
      Hello world from 14!!
      Hello world from 15!!
      Hello world from 32!!
      Hello world from 33!!
      Hello world from 34!!
      Hello world from 35!!
      Hello world from 36!!
      Hello world from 37!!
      Hello world from 38!!
      Hello world from 39!!
      Hello world from 40!!
      Hello world from 41!!
      Hello world from 42!!
      Hello world from 43!!
      Hello world from 44!!
      Hello world from 45!!
      Hello world from 46!!
      Hello world from 47!!
      Hello world from 16!!
      Hello world from 17!!
      Hello world from 18!!
      Hello world from 19!!
      Hello world from 20!!
      Hello world from 21!!
      Hello world from 22!!
      Hello world from 23!!
      Hello world from 24!!
      Hello world from 25!!
      Hello world from 26!!
      Hello world from 27!!
      Hello world from 28!!
      Hello world from 29!!
      Hello world from 30!!
      Hello world from 31!!


      Note:

      1、cudaDeviceSynchronize(),这是一个显示的barrier(),起作用是host等待device的kernel函数执行完成(GPU的执行是异步的)

      2、.cu文件时主机和设备函数的源文件

      3、CUDA api介绍,cudaMalloc/cudaFree,设备端内存分配和释放,cudaMemcpy,GPU和CPU间的内存拷贝

      4、CUDA编译,如图2所示,.cubin:特定GPU相关的cuda binary,.ptx:可移植的设备汇编格式(文本文件)

      图2 CUDA编译过程简略图

      CUDA的内存层次

      • 主机内存,CPU/DRAM

      • 设备内存

        • 本地内存/寄存器:Thread,每个SM有一组寄存器,分配给该SM上的活动线程。GPU的计算能力决定着每个线程可利用的寄存器的最大数量。

        • 共享内存:Block,SM上所有内核共享的快速片上RAM,可通过cudaDeviceSetCacheConfig()设置。通过__shared__来实现共享内存存储

        • L1/L2缓存:Block

        • 全局内存:Grid,Host通过CUDA API唯一可访问的内存

        • texture/surface内存(Texture,Read Only):Grid,texture通常用来执行滤波或数据变换操作

        • 常量内存(片外内存,Read Only):Grid,专门用来存储常量内存,通常由__constant__来说明

      CUDA的调试

      • nsight

      • CUDA-GDB

      在Linux和MAC上,nsight就是CUDA-GDB的前端


      [1] https://blog.csdn.net/xiaohu2022/article/details/79599947

      [2] https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming)

      [3] https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/

      文章转载自Qiang的杂谈,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

      评论