Schnelleinstieg Reader


Startseite FSU

Programming with CUDA

Wintersemester 2009/10

Note: This lecture will be taught in English.

Advances in GPU hardware have made GPUs computationally far superior to CPUs. State of the art GPUs achieve higher GFLOPs than CPUs at lesser temperatures. Thus, while thermal factors may slow down progress in CPU hardware development, no such barrier is currently in sight for GPUs.



The computational power of GPUs has so far been restricted to niche graphics programmers because of hardware and API restrictions. However, many non-graphics applications can also benefit from higher computational capabilities.

CUDA, Compute Unified Device Architecture, is a new technogloy that lets ordinary programmers harness the computational powers of modern GPUs. On the software side, it is a minimal set of extensions to the popular C programming language, and on the hardware side, it comprises of the hardware that supports this programming language.

 Quick links : Organization | References | Lecture slides | Exercises and slides | Exams| Projects


Lectures will take place on
every Tuesday
: 12pm-2pm Carl-Zeiss-Straße (room 129)
every second Thursday, starting 29 Oct : 4pm-6pm Carl-Zeiss-Straße (room 129)

Exercises will take place on
every Wednesday : 8.30am-10am Carl-Zeiss-Straße (room 125)
every second Thursday, starting 5 Nov : date and time to be decided

The course is planned in two parts.
Part 1 : (before Christmas) We will present the CUDA programming language, and the associated execution and memory models. During this time students will also form groups and each group will choose or be assigned a project.
Part 2 : (after Christmas) Groups formed in the first part of the course will work on and present their projects.

The people responsible for the course are
Waqar Saleem
Jens K. Müller (assistant)
Public hours are on
Wednesdays : 2pm-4pm Ernst-Abbe-Platz 2 (room 3335)

Lecture slides will be made available at this website the morning of each lecture.


Reference Material:

The NVIDIA website provides numerous resources for CUDA. A few starting points are (links to university courses on CUDA) (documentation, programming guide)


Lecture slides:

Lecture slides borrow heavily from the NVIDIA CUDA Programming Guide v2.3.1 and the CUDA book.

Tuesday, 20 Oct, 2009 Organization and Intro to GPGPU ppt pdf
Tuesday, 27 Oct, 2009
CUDA hardware model, threads and memory ppt pdf
Thursday, 29 Oct, 2009
CUDA extensions to C pdf
Tuesday, 3 Nov, 2009
CGMA optimization for higher performance pdf
Tuesday, 10 Nov, 2009 More on CUDA memories pdf
Thursday, 12 Nov, 2009 CUDA arrays and Texture Memory pdf
Tuesday, 17 Nov, 2009 CUDA Texture Memory Commands and Addressing pdf
Thursday, 19 Nov, 2009 CUDA Driver API pdf
Tuesday, 24 Nov, 2009 Instruction and Memory Optimizations pdf
Thursday, 26 Nov, 2009 Memory optimizations (cont'd) pdf
Thursday, 3 Dec, 2009

General Optimizations pdf
Mark Harris' slides (covered in the lecture)
Updated version (not from the lecture)

Tuesday, 8 Dec, 2009

Intro to OpenCL, Neil Trevett's slides pdf
NVIDA OpenCL, AMD OpenCL, Khronos OpenCL

Thursday, 10 Dec, 2009

 Writing an OpenCL program pdf

OpenCL 1.0 reference pages, NVIDIA OpenCL JumpStart Guide v0.9
Image Convolution example

Tuesday, 15 Dec, 2009 Analyzing parallel algorithms pdf



Tutorial slides (updated after each tutorial/exercise)

Wednesday, 21 Oct, 2009 Exercise No. 1 material: raytracer.tar.gz, scene.ppm
Wednesday, 28 Oct, 2009
Wednesday, 4 Nov, 2009 Exercise No. 2 material: squirrel.yaml, scene.yaml, scene.ppm CMakeLists.txt FindCUDASDK.cmake
Wednesday, 11 Nov, 2009
Wednesday, 18 Nov, 2009 Exercise No. 3 material: tetraeder.yaml, scene2.yaml, scene3.yaml
Wednesday, 25 Nov, 2009 Exercise No. 4 material: tetraeder_with_lights.yaml, double-torus1.yaml, orientation-z.yaml
Wednesday, 2 Dec, 2009 Exercise No. 5
Wednesday, 9 Dec, 2009 Exercise No. 6
Wednesday, 16 Dec, 2009 Exercise No. 7 material: FindOpenCL.cmake



The schedule for project meetings for the second half of the semester is as follows.
(UPDATE) Added your promotional material.

Weekly meetings

UHG 166, 12 Feb


Compression LZ77
Tue, 12-13
git at ssh://ipc858/~aliv/git_proj 12,40-13,00

Swarm Visualization
Tue, 13-14 whiteboards hg at ssh://mpc711//~philippl/swarm_project 12,00-12,20
slides (pdf)

More screens
1 2 3 4 5
Video Editing Wed, 10-11 whiteboards 12,20-12,40
slides (html)

Image effects
blur gray sepia sobel
Video effects
blur hole ghost quake teeth
a sample video

Output video
(lowered quality, mpg, ~7MB)
Audio Signal Processing
Wed, 11-12 whiteboards
git at ssh://ipc858/~nitro/CUDAPA/audio_project
slides (pdf)
Game of Life Thu, 15-16 whiteboards
slides (pdf)
Java Virtual Machine
Thu, 16-17 whiteboards svn at
slides (pdf)

Your project has to be submitted until Friday, 12th of February 2009. We will checkout your sources using the above URLs. Your project has to be ready to compile.

Project presentations will take place on Friday, 12th of February, 2009 in SR 166, UHG from 11h to 13h. Each group is assigned 15-20 minutes. You should leave some time for audience questions and discussion. We suggest the following outline for the presentation as a rough guide

- Problem description
- Challenges: CPU vs GPU implementation
- Results
- Demo

The presentations are scheduled in following order.

12:20-12:40 12:40-13:00
Java Virtual Machine Game of Life Audio Signal Processing Swarm Visualization  Video Editing Compression LZ77

To ensure smooth and quick transition between different presentations, we require you to send in your presentations in PDF format at the same email address as your assignments. If your presentation has animations or for any other reason, PDF is not a suitable format for you, please let us know as soon as possible. Please submit all files by 10h on the day of the presentation. For the presentations there will be a Laptop running Linux wired to the university network. For running your demo you can access one of the CUDA machines.



Your individual exam will be mainly about your project. We are going to ask CUDA-related questions regarding your implementation, e.g.

  • your choice of implementation
  • precautions taken to avoid non-coalesced memory accesses
  • strategies to avoid bank conflicts
  • possible occurrences of divergent threads and at what cost diversion could be avoided
  • potential optimizations of certain parts of your code
  • trade-offs in the choosing the particular memory you use
  • discussion on occupancy and grid configurations

Basically, you should have an understanding of programming and optimizing with CUDA and be able to demonstrate how you applied that in your project.

Exams have been scheduled from 10h to 12,30h and 14h to 15,30h on Tuesday, the 16th of February, as follows. They will take place in room 3334 Ernst-Abbe-Platz 2.

16th February

10:00 - 10:30
Prinz, Thomas
10:30 - 11:00
Lucas, Philipp
11:00 - 11:30
Beier, Tobias
11:30 - 12:00 Kaiser, Markus
12:00 - 12:30
Kühne, Lars

14:00 - 14:30
Tandetzky, Max
14:30 - 15:00
Rumpf, Thomas
15:00 - 15:30
Voigt, Alexander