Parallel Programming :: Winter 2019

Stanford CS149, Winter 2019

PARALLEL COMPUTING

NOTICE: THE MOST RECENT OFFERING OF THE COURSE CAN BE VIEWED AT:

http://cs149.stanford.edu

From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.

Basic Info

Tues/Thurs 4:30-6:00pm

Room 420-040

Instructors: Kunle Olukotun and Kayvon Fatahalian

See the course info page for more info on course policies and logistics.

Winter 2019 Schedule

Jan 8	Course Introduction + Why Parallelism? Motivations for parallel chip decisions, challenges of parallelizing code
Jan 10	A Modern Multi-Core Processor Forms of parallelism: multicore, SIMD, threading + understanding latency and bandwidth
Jan 15	Parallel Programming Models and their Corresponding HW/SW Implementations ways of thinking about parallel programs, and their corresponding hardware implementations, ISPC programming
Jan 17	Parallel Programming Basics Thought process of parallelizing a program in data parallel and shared address space models
Jan 22	Program Optimization 1: Work Distribution and Scheduling Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
Jan 24	Program Optimization 2: Locality and Communication Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
Jan 29	GPU architecture and CUDA Programming CUDA programming abstractions, and how they are implemented on modern GPUs
Jan 31	Snooping-Based Cache Coherence Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing
Feb 5	Memory Consistency Consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics
Feb 7	Directory-Based Coherence + Implementing Synchronization Directory-based coherence, machine-level atomic operations, implementing locks, implementing barriers
Feb 12	Midterm Exam
Feb 14	Data-Parallel Thinking Data parallel thinking: map, reduce, scan, prefix sum, groupByKey
Feb 19	Distributed Computing using Spark producer-consumer locality, RDD abstraction, Spark implementation and scheduling
Feb 21	Fine-Grained Synchronization and Lock-Free Programming Fine-grained snychronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers
Feb 26	Transactional memory Motivation for transactions, design space of transactional memory implementations, lazy-optimistic HTM
Feb 28	Heterogeneous Parallelism and Hardware Specialization Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs
Mar 5	Domain-Specific Programming Systems Motivation for DSLs, OptiML, Delite, case study on Halide
Mar 7	Parallel Graph Processing Frameworks + How DRAM Works GraphLab, Ligra, and GraphChi, streaming graph processing, graph compression
Mar 12	Efficienly Evaluating DNNs Scheduling convlayers, exploiting precision and sparsity, DNN acelerators (e.g., GPU TensorCores, TPU)
Mar 14	Parallel DNN Training + Course Wrap Up Have a great spring break!

Programming Assignments

Jan 18	Assignment 1: Analyzing Parallel Program Performance on a Quad-Core CPU
Feb 1	Assignment 2: A Multi-Threaded Web Server in Java
Feb 17	Assignment 3: Big Graph Processing in OpenMP
Mar 1	Assignment 4: A Simple Renderer in CUDA
Mar 14	Assignment 5: Big Data Processing in Spark

Written Assignments

Jan 25	Written Assignment 1: Multi-threading Scheduling + The Most ALUs Wins + Angry Students
Feb 5	Written Assignment 2: Sending Messages + Processing Pipeline + Be a Processor Architect
Feb 20	Written Assignment 3: Spin Locks + PKPU + Data Parallel Thinking
Feb 27	Written Assignment 4: Spark Memory Footprint + Concurrent Hashtables and Graphs
Mar 8	Written Assignment 5: Comparing and Swapping + Transactions on Trees