ECE 8823: GPU Architectures

ECE 8823: GPU Architectures

Prerequisite: ECE 4100/6100 or CS 4290/6290

Course Description: The last 8 years has seen the emergence of general-purpose graphics processing units (GPUs) as vehicles for accelerating general purpose scientific, enterprise, and embedded applications. This emergence has coincided with the explosive growth of data parallel applications and the ascendance of energy efficiency as a driver of performance scalability. The research community has evolved a body of compiler and microarchitecture knowledge to address important bottlenecks to harnessing the enormous throughput and memory bandwidth of modern GPUs. This course introduces the basic organizational principles of the major components of a general purpose graphics processing unit (GPU) architecture. The course begins with coverage of a commodity language (CUDA) that implements the Single Instruction Multiple Thread (SIMT) programming model and introduces basic programming abstractions and idioms. It the provides an in-depth coverage of important microarchitecture concepts and performance optimizations for the efficient implementation of the SIMT model, elaborating state of the art techniques for performance optimization through coverage of the latest papers in leading international journals and conferencs augmented with key patents and class notes. A series of programming assignments and class project reinforce these concepts.

Course Texts:
D. Kirk and W. Hwu, “Programming Massively Parallel Processors: A Hands On Approach,” Morgan Kaufmann (pubs), Second Ed., 2012, ISBN 978-0-12-415992-
Journal & Conference papers, patents, class notes

Publisher Website: Supplemental Material

Course Syllabus Syllabus

Class Resources

Last Updated Module Lecture Reading Notes/Additional References
0 Overview (ppt, pdf)
1 Introduction (ppt, pdf) Chapter 1, Section 2.2, Section 2.3
2 Introduction to CUDA C (ppt, pdf) Chapter 3
3 Data Parallel Execution (ppt, pdf) Chapter 4
4 CUDA Memory Model (ppt, pdf) Chapter 5
5 Program Mapping and Execution:
Mapping – 1 (ppt, pdf)
Mapping – 2 (ppt, pdf)
Chapter 6  Occupancy calculator
6 Microarchitecture – I: Kernel Execution (ppt, pdf)
Microarchitecture – II: SM Microarchitecture (ppt, pdf)
Microarchitecture – III: Harmonica GPU (ppt, pdf)
Microarchitecture – IV: Register File (ppt, pdf)
See posted papers and patents
Harmonica GPU (ppt, pdf)
7 Control Divergence – I (ppt, pdf)
Control Divergence – II (pdf)
Control Divergence – III  (pptpdf)
Control Divergence – IV (ppt, pdf)
See posted conference and journal papers. Assignment-4 Control Flow (pdf)
8  Memory Optimizations  See module on scheduling
9 Scheduling (ppt, pdf) See posted conference and journal papers.
10 Power (ppt, pdf) See posted conference and journal papers.
11 CUDA Dynamic Parallelism (ppt, pdf) See relevant papers and presentations posted in Module 6
12  Introduction to OpenCL and OpenACC (ppt,pdf)