Schnelleinstieg Reader


Startseite FSU

Programming with CUDA

Wintersemester 2008/09

Note: This lecture will be taught in English.


Advances in GPU hardware have made GPUs computationally far superior to CPUs. State of the art GPUs achieve higher GFLOPs than CPUs at lesser temperatures. Thus, while thermal factors may slow down progress in CPU hardware development, no such barrier is currently in sight for GPUs.


The computational power of GPUs has so far been restricted to niche graphics programmers because of hardware and API restrictions. However, many non-graphics applications can also benefit from higher computational capabilities.

CUDA, Compute Unified Device Architecture, is a new technogloy that lets ordinary programmers harness the computational powers of modern GPUs. On the software side, it is a minimal set of extensions to the popular C programming language, and on the hardware side, it comprises of the hardware that supports this programming language.

Quick links : Organization | References | Lecture slides | Exercises and slides | Exams | Projects and presentations


The lecture will take place twice a week.
: 4pm-5pm Ernst-Abbe-Platz 2 (room 3517)
Thursdays : 4pm-6pm Carl-Zeiss-Strasse (room 125)

Weekly exercises will take place on
Tuesdays : 5pm-6pm Ernst-Abbe-Platz 2 (room 3517)

The lecture is planned in two parts.
Part 1 : (before Christmas) We will present the CUDA programming language, and the associated execution and memory models. During this time time students will also form groups and each group will choose or be assigned a project.
Part 2 : (after Christmas) Groups formed in the first part of the course will work on and present their projects.

The people responsible for the course are
Waqar Saleem
Jens K. Müller (assistant)
Wednesdays : 2pm-4pm Ernst-Abbe-Platz 2 (room 3311)

Lecture slides will be made available at this website after each lecture.


Reference Material:

The NVIDIA website provides numerous resources for CUDA. A few starting points are (links to university courses on CUDA) (documentation, programming guide)

Lecture slides will borrow heavily from the above material.


Lecture slides:

Please note that the university web server refreshes this site only twice per day. Changes made to the site therefore appear online only after the next refresh cycle.

Tuesday, 21 Oct, 2008 Organization and Intro to GPGPU ppt pdf
Thursday, 23 Oct, 2008
Immatrikulationsparty . No lecture
Tuesday, 28 Oct, 2008
The CUDA programming model ppt pdf
Thursday, 30 Oct, 2008
The CUDA programming model (cont'd) ppt pdf
Tuesday, 4 Nov, 2008
The CUDA (Tesla) hardware model ppt pdf
Thursday, 6 Nov, 2008
The CUDA API ppt pdf
Tuesday, 11 Nov, 2008 CUDA Runtime Component ppt pdf
Thursday, 13 Nov, 2008 CUDA Runtime Component (cont'd) ppt pdf
Tuesday, 18 Nov, 2008 CUDA Runtime Component (cont'd) ppt pdf
Thursday, 20 Nov, 2008 Memory & instruction optimizations ppt pdf
Tuesday, 25 Nov, 2008 Memory & instruction optimizations (cont'd) ppt pdf
Thursday, 27 Nov, 2008

Parallelization and Optimization Examples ppt pdf
Parallel Reduction (Mark Harris, NVIDIA) pdf

Tuesday, 2 Dec, 2008 Revisiting shared memory bank conflicts ppt pdf
Thursday, 4 Dec, 2008 Parallel architectures and analysis ppt pdf

No more lectures



Please note that the university web server refreshes this site only twice per day. Changes made to the site therefore appear online only after the next refresh cycle.

Tutorial slides (updated after each tutorial/exercise)

Tuesday, 21 Oct, 2008 First lecture . No exercise.
Tuesday, 28 Oct, 2008 Exercise No. 1
Tuesday, 4 Nov, 2008 Exercise No. 2
Tuesday, 11 Nov, 2008 Exercise No. 3
Tuesday, 18 Nov, 2008 Exercise No. 4 (material lena256.pgm )
Tuesday, 25 Nov, 2008 Exercise No. 5
Tuesday, 2 Dec, 2008 Exercise No. 6
Tuesday, 9 Dec, 2008
Tuesday, 16 Dec, 2008



Following is a timeline regarding assignment of project groups and topics:

Thu, 20 Nov
Sample project ideas distributed.
Tue, 25 Nov
Students suggests groups and topics.
If you are not part of a suggested group, you will be randomly assigned to one.
Thu, 27 Nov
Groups and topics assigned.
You have till next lecture to request changes.
Tue, 2 Dec
Groups and topics finalized.

The groups and topics are:

Group A

Thomas B., Martin S., Christoph G.


Group B
Maximillian F., Torsten W.
DES cracker
Group C
Jochen E., Robert B., Ralf S.
Group D
Bernd W., Thomas N., Sebastian A.
GIMP plugin

Starting from Tuesday, 13 Jan , we will cycle over the following schedule for group meetings.

Tue, 16-17h
Group A
Tue, 17-18h
Group B
Thu, 16-17h
Group C
Thu, 17-18h
Group D

Your project has to be submitted until Sunday, 15th of February 2009. Send all relevant files as a zip, tar.gz or similar by mail. Your project has to be ready to compile.

Project presentations will take place on Tuesday, 17th of February, 2009 in room 3517 (Ernst-Abbe-Platz 2). Each group is assigned 30 minutes out of which at least 5 minutes at the end should be left for audience questions and discussion. We suggest the following outline for the presentation as a rough guide

- Problem description
- Challenges: CPU vs GPU implementation
- Results
- Benchmarking
- Use standard open tools where available
- Otherwise, use your own serial implementation
- Problems and Limitations
- Outlook

To ensure smooth and quick transition between different presentations, we require you to send in your presentations in PDF format at the same email address as your assignments. If your presentation has animations or for any other reason, PDF is not a suitable format for you, please let us know as soon as possible. Please submit all files by 12h on the day of the presentation. The schedule so far is as follows

Raytracer TSP
DES Gimp Plugin



Your individual exam will be mainly about your project. We are going to ask CUDA-related questions regarding your implementation. I.e.,

  • Why did you implemented it like this?
  • Is your memory access coalesced? What does this mean?
  • What about bank conflicts?
  • Are your threads diverging? What does this mean?
  • How can this code here be optimized?
  • What kind of memory are you using? Why? What are their benefits?
  • Why did you choose these number of block and grid dimensions? What about the occupancy?

Basically, this means you should have an understanding of programming and optimizing with CUDA.

Exams have been scheduled as follows.

26th February

27th February

10:00 - 10:30
Maximillian F. Jochen E.
10:40 - 11:10
Torsten W. Robert B.
11:20 - 11:50

Ralf S.

13:20 - 13:50
Christoph G. Sebastian A.
14:00 - 14:30
Thomas B. Thomas N.
14:40 - 15:10
Martin S. Bernd W.