Int i = blockIdx.x*blockDim.x + threadIdx.x ĬudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost) Void saxpy(int n, float a, float *x, float *y) In this post I will dissect a more complete version of the CUDA C SAXPY, explaining in detail what is done and why. SAXPY stands for “Single-precision A*X Plus Y”, and is a good “hello world” example for parallel computation. In a recent post, I illustrated Six Ways to SAXPY, which includes a CUDA C version. Keeping this sequence of operations in mind, let’s look at a CUDA C example.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |