1.
질문 1Which statement is not true about the syntax below:
kernel<<<a,b>>>(args)
The value(s) in args, can only be the name(s) of a kernel(s).
This is false as the value(s) that args represents are arguments to the kernel.
2.
질문 2If you would like a 2-dimensional data/execution mapping of threads in your blocks for your kernel, which of the following would calculate the index of the thread and data? Note: that N is defined as 512
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y + blockDim.y + threadIdx.y;
int index = x + y *N;
3.
질문 3If you would like a 3-dimensional data/execution mapping of threads in your blocks for your kernel, which of the following would calculate the index of the thread and data? Note: that N is defined as 512
int index = blockIdx.x * blockDim.x * blockDim.y + blockDim.z * threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
the index variable calculates the x, y, and z offsets based on the threadIdx variables for x,y,z values and the height, width, and depth of the 3D thread-block space.
4.
질문 4
Given the following code how many threads will be executed in a single block (the variable threadsPerBlockPerDimension)? Note: math.pow(x,2) returns x squared.
#include <math.h> #define N 16384 int blocksPerGridPerDimension = 2; dim3 gridLayout(blocksPerGridPerDimension, blocksPerGridPerDimension); int threadsPerBlockPerDimension=N/math.pow(blocksPerGridPerDimension,2);
4096
16384/(2*2)=4096.
5.
질문 5
Given the following code, how would you change the initialization of the dim3 block variable to create blocks of 256 threads? You are able to change the dimensions of the dim3 for the block and the associated code to retrieve the thread index. dim3 grid(32,32); dim3 block(1,1); kernel<<<grid,block>>>(args);
dim3 block(8,4,8)
This is correct as 8*4*8 = 256
6.
질문 6Given the following kernel code (executable from host code), choose a bolded area to correct and the correct code snippet from the options below:
__device__ matrixAdd(int *matrix_a, int *matrix_b, int *matrix_c){ int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId *(blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.z) + threadIdx.x; c[threadId] = a[blockId] + b[threadId]; }
__global__ matrixAdd(int *matrix_a, int *matrix_b, int *matrix_c){
This is a correct answer, as the __device__ keyword is only for functions that are called from GPU devices, not the CPU host.
+ (threadIdx.y *blockDim.x) + threadIdx.x;
This is a correct answer because the original statement had the y thread offset calculated by using the z dimension of the block dimension, which can give the wrong dimensional offset when the dimension of y and z are not equal in the size in threads.
'Coursera' 카테고리의 다른 글
[GPU programming] CUDA Constant and Shared Memory Quiz (0) | 2022.07.27 |
---|---|
[GPU programming] CPU and GPU Global Memory Quiz (0) | 2022.07.27 |
[GPU programming] GPU Programming Quiz (0) | 2022.06.27 |
[GPU programming] Nvidia Software and Hardware Quiz (0) | 2022.06.26 |
[GPU programming] c++ Parallel Programming Quiz (0) | 2022.06.23 |