To understand a program you must become both the machine and the program. Alan Perlis, “Epigrams” As software engineers, we study computer systems (also called "computer architecture") to be able to understand how our programs ultimately run and how the machine expects our data to be encoded. Our immediate reward is to be able to write faster, more memory-efficient and more secure code.
Longer term, the value of understanding computer systems may be even greater. Every abstraction between us and the hardware leaks, to one degree or another. This course aims to provide a set of first principles from which to build sturdier mental models and reason more effectively.
We’ll start by considering the machine's expectation of how our data is encoded, as well as some higher level binary representations like those for text. We'll continue through introductory C and assembly programming, to better understand the interface that a typical computer provides for executing programs. Finally we'll cover two important areas for improving program performance: utilizing the CPU microarchitecture, and CPU caches (the memory hierarchy).
At the core of this course are the sequences of problems for each topic. You should aim to solve each problem, using the worked solutions and supplementary explainers as needed. There are also some full seminars, which some find helpful to tie topics together. While no textbook is necessary for this course, we do recommend Computer Systems: A Programmer’s Perspective (“CS:APP”) as a supplement, and reference relevant CS:APP chapters below. For more suggestions on how to approach CS Primer, see the how-to guide.
Bits and bytes
This module develops the skill of working with binary encodings of data, which will be invaluable in later courses like networking and databases and an important capability in its own right..
Key concepts covered include fixed width integer encodings and byte ordering, signed integer encodings, the IEEE-754 floating point scheme, Unicode and encodings such as UTF-8.
If time is limited or you are using this material for revision, we suggest you solve at least "Protobuf varint" or "TCP SYN flood" for integer encodings, "Sneaky NaN" for floating point and "UTF-8 truncate" for text encodings. Full seminars are included to provide motivating context and overviews for those who are less familiar with these concepts, although we still suggest attempting the problems first!
For those following along with CS:APP, we suggest chapter 2: "Representing and Manipulating Information" as a supplement
|Protobuf varint||implement the variable length integer encoding in Protocol Buffers (57:39)|
|CSS color convert||convert CSS files from hex to rgb color declarations (36:17)|
|Beep beep boop||make the terminal beep… even ASCII isn't text! (28:14)|
|Image rotate||parse a BMP file and rotate it 90 degrees (57:50)|
|TCP SYN flood||parse pcap, IP and TCP headers to analyze a SYN flood attack (52:52)|
|UTF-8 truncate||cleanly truncate UTF-8 encoded strings by understanding the encoding (25:30)|
|Sneaky NaN FREE||encode a hidden message inside a NaN (26:33)|
Introduction to C
This module is a crash course in the C programming language and its associated tooling. C allows us to write "high-level" codeHigher level than assembly, that is! with functions, arrays and pointers, structs, looping and conditional constructs and a syntax that isn’t tied to any one computer architecture. We take many of these abstractions for granted, but it's important to at least know that we rely upon them!
Our goal is to cover enough C syntax that we can later disassemble short C snippets to better understand the hardware/software interface. In later courses, we will also read code from a number of important open source projects written in C.Consider that Linux, PostgreSQL, the Apache HTTP Server and CPython (the main Python implementation) are all written in C. Others such as V8 are written in C++.
While not necessary, it's useful to have a copy of The C Programming Language (also known as K&R C) available as a reference, and for additional practice problems.
We suggest completing at least the first few "Bits and bytes" problems before starting this one, particularly if you're unfamiliar with bitwise operators.
|Hello, World!||make sure you can compile and execute a C program (14:35)|
|Bitcount FREE||use basic C types, operators and control flow to count bits (23:40)|
|Fast pangram||rewrite a short Python program in C to compare execution time (34:10)|
|Varint C extension||write a Python module as a C extension (50:31)|
|Basic hashmap||implement your own hashmap for more struct and pointer practice (1:21:47)|
This module introduces the x86-64 architecture,This is the 64-bit version of Intel’s x86 architecture, an astonishingly popular architecture first introduced with the Intel 8086, released in 1978. Modern x86 machines are largely backwards compatible with programs written for these older machines, including the Intel 386 or 486 which were the first computers that many of us used. and provides an opportunity for us to write programs “close to the metal” in x86-64 assembly language.
Assembly language has become an abstraction layer, but it remains one of the best ways for us as software engineers to reason about the execution of our programs. In these exercises, we’ll primarily write short programs to help us understand how our higher level programming constructs are evaluated by the machine.
We will be using the nasm assembler; you may find this tutorial helpful to get started, and the nasm manual generally useful. You will almost certainly need a reference for the available instructions on x86-64, such as Felix Cloutier's x86 and amd64 instruction reference. While we link explanatory videos along the way, many students find this a new and challenging topic and respond positively to a variety of introductory guides: you may enjoy some assembly required, a fundamental introduction, or others you find yourself.
For those using CS:APP as a supplement, we suggest chapters 3.1-3.6.
|Assembly "Hello world!"||Run your first program assembled from x86-64 assembly (31:53)|
|Sum to N||write a very simple assembly program to sum the integers to N (46:26)|
|Matrix access||simply make use of the x86-64 mov instruction for the first time (15:46)|
|x86‑64 pangram||convert your "fast pangram" C program to x86-64 assembly (17:08)|
|Binary convert||convert a string of ASCII 0's and 1's to an integer (09:47)|
|Cone volume||use floating point registers and operations to calculate the volume of a cone (11:12)|
|Low level recursion||understand recursion by writing a recursive Fibonacci implementation in assembly (26:13)|
CPU microarchitecture and low‑level performance
One of the most rewarding aspects of understanding computer architecture is being able to use a machine’s characteristics to one's advantage to write faster programs. While we often expect our compilers to find optimizations for us, there is both a limit to what a compiler can understanding about your data, and a value to overseeing the compiler's decision making.
In this class, we explore some techniques for program optimization, mostly as an excuse to increase our understanding of how the CPU actually works, including microarchitectural considerations like micro-operations, pipelining, branch preduction and out-of-order execution.
For supplementary reading we suggest CS:APP chapter 5: Optimizing Program Performance. We also draw heavily upon Agner Fog's fantastic Optimization manuals.
|Faster sum||Use your understanding of CPU microarchitecture to speed up a basic program (23:20)|
|Color quantizing||write a branchless version of a color conversion algorithm (40:56)|
The memory hierarchy
In this module, we explore one of the most important practical aspects of modern computer architectures: the “memory hierarchy”.
Modern computers use a series of hardware caches to mitigate the cost of accessing main memory.Consider that on a 4GHz machine, the time taken for an unnecessary 100ms trip to RAM could have been used to execute 400 or more 0.25ns instructions! For many programs, the CPU cache miss rate can be a substantial cause of poor performance. We cover the reason for and basic operation of CPU caches, and most practically how to measure and best utilize them.
For supporting material, we suggest chapters 6.1-6.3 of CS:APP as well as the Algorithms for Modern Hardware chapter on RAM & CPU Caches.
|Grayscale speedup||with an understanding of CPU caches, make a big speedup with a small change (10:34)|
|Pointer chase||speed up a "typical" Python program by reducing pointer chasing (29:04)|
|Bogosum||think concretely about (and measure!) cache use in linear vs random access patterns (1:00:14)|