Cambridge CUDA Course 25-27 May 2009
Around 40 people attended this 6 lecture course on CUDA. Links to the presentations and the codes used can be found below.
GPUs are cheap, massively parallel, programmable compute devices that can be used for many general purpose (non-graphics) tasks. They are a "good fit" for many scientific applications and significant speedups (as compared to contemporary CPUs) have been reported. The CUDA language makes NVIDIA GPUs accessible to developers through a series of extensions to C (with no mention of pixels or shading!). In this series of lectures, we aim to show: how to build and configure a CUDA computer; how to write some simple (and some less-simple) CUDA kernels; and how to go about optimising your CUDA code to achieve better performance.
Day 1 - Monday 25th May
Getting started - Graham Pullan
This lecture introduces the hardware and software needed to run CUDA. What PCs and GPUs are suitable? How do we run our first CUDA "Hello World" program?
Downloads:
- Presentation: Lecture1.pdf
- Codes: CUDA hello world, vector_add_onethread.cu
Threads - Graham Pullan
In this lecture, a simple one equation PDE (heat conduction equation) is used as an example problem. We move from a naive CUDA implementation to a more optimal one and examine the reasons for the changes in performance.
Downloads:
- Presentation: Lecture2.pdf
- Codes: 2D heat conduction solver, heat.cu
Day 2 - Tuesday 26th May
Developing kernels - Part 1 - Steven Gratton
This lecture introduces the need for multi-kernel CUDA programs. The example application is Cholesky matrix factorisation.
Downloads:
- Presentation: Lecture3.pdf
- Codes: Basic Cholesky on the CPU, prog0.c; and the same algorithm on the GPU prog1.cu
Developing kernels - Part 2 - Steven Gratton
To make further improvements to CUDA programs, we must take still more care to tune our code to the underlying hardware. In this lecture, the Cholesky factorisation code is improved by making changes to the underlying algorithm. We also look at the .ptx and .cubin codes.
Downloads:
- Presentation: Lecture4.pdf
- Codes: cholesky.cu; transchol.cu; doublechol.cu
Day 3 - Wednesday 27th May
CUDA with multiple GPUs - Tobias Brandvik
In this lecture, we show how to run CUDA codes over multiple GPUs on multiple hosts. The basics of MPI and their application to grid calculation methods on the CPU and GPU are presented.
Downloads:
- Presentation: Lecture5.pdf
- Codes: 2D heat conduction code with MPI mpi_heat.cu
Application example - medical imaging registration - Richard Ansorge
In the final lecture, a target application is discussed in more detail. In addition, the use of 2D and 3D texture memory is described and compared to the more usual global and shared memory types.
Downloads:
- Presentation: Lecture6.pdf

![[Lecture 1]](gp/L1.jpg)
![[Lecture 2]](gp/L2.jpg)
![[Lecture 3]](stg/L3.jpg)
![[Lecture 4]](stg/L4.jpg)
![[Lecture 5]](tb/L5.jpg)
![[Lecture 6]](rea/L6.jpg)