Hardware and its Concurrency Habits

Hardware and its Concurrency
Habits
© 2023 Meta Platforms
Paul E. McKenney, Meta Platforms Kernel Team
Kernel Recipes, September 27, 2023
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html Chapter 3

View Slide

2
Recette Pour le Codage Simultané
●
Une pincée de connaissance des lois de la
physique
●
Compréhension modeste du matériel informatique
●
Compréhension approfondie des exigences
●
Conception soignée, y compris la synchronisation
●
Validation brutale
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html Chapitre 3

View Slide

3
Recette Pour le Codage Simultané
●
Une pincée de connaissance des lois de la
physique
●
Compréhension modeste du matériel informatique
●
Compréhension approfondie des exigences
●
Conception soignée, y compris la synchronisation
●
Validation brutale
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html Chapitre 3

View Slide

5
“Let Them Run Free!!!”

View Slide

6
CPU Benchmark Trackmeet

View Slide

7
Sadly, it is now more of an obstacle course than a track...

View Slide

8
Don’t Make ‘em Like They Used To!

View Slide

9
Don’t Make ‘em Like They Used To
4.0 GHz clock, 20 MB L3
cache, 20 stage pipeline...
The only pipeline I need
is to cool off that hot-
headed brat.

View Slide

10
headed brat.
●
No cache
No cache
●
Shallow pipeline
Shallow pipeline
●
In-order execution
In-order execution
●
One instruction at a time
●
Predictable (slow)
Predictable (slow)
execution
execution
●
Large cache
Large cache
●
Deep pipeline
Deep pipeline
●
Out of order
Out of order
●
Super scalar
Super scalar
●
Unpredictable (fast)
execution
execution

View Slide

11
headed brat.
●
No cache
No cache
●
Shallow pipeline
Shallow pipeline
●
In-order execution
In-order execution
●
●
Predictable (slow)
Predictable (slow)
execution
execution
●
Large cache
Large cache
●
Deep pipeline
Deep pipeline
●
Out of order
Out of order
●
Super scalar
Super scalar
●
execution
execution
“Tiny Bulldozer” “Semi Tractor-Trailer”

View Slide

12
headed brat.
●
No cache
No cache
●
Shallow pipeline
Shallow pipeline
●
In-order execution
In-order execution
●
●
Predictable (slow)
Predictable (slow)
execution
execution
●
Large cache
Large cache
●
Deep pipeline
Deep pipeline
●
Out of order
Out of order
●
Super scalar
Super scalar
●
execution
execution
What would be the computing-systems equivalents of a freight train?

View Slide

13
“Good Olde Days” CPU Architecture
80386 Architecture (Wikipedia user “Appaloosa” GDFL, simplified and reformatted)
32 bit
Protection
Test Unit
ALU
●
Barrel Shifter
●
Multiply/Divide
●
Register File
Segmentation
Bus
Control
32 Bit
Paging
Prefetcher /
Limit Checker
Code Queue
(16 Bytes)
Instruction
Decoder
3-Decoded
Instruction
Queue
Decode and
Sequencing
Control
ROM
Dedicated ALU Bus
34 bit
32 bit Effective Address
Status Flags
ALU Control

View Slide

14
The 80386 Taught Me Concurrency
That and a logic analyzer...

View Slide

15
But Instructions Took Several Cycles!

View Slide

16
Pipelined Execution For The Win!!!
(Wikipedia user “Amit6” CC BY-SA 3.0, reformatted)

View Slide

17
Superscalar Execution For The Win!!!
Intel Core 2 Architecture (Wikipedia user “I, Appaloosa” CC BY-SA 3.0, reformatted)
128 entry
ITLB
32 KB Instruction Cache
(8 way)
32 Byte Pre-Decode,
Fetch Buffer
18 Entry
Instruction Queue
128 Bit
6 Instructions
Instruction
Fetch Unit
Micro-
code
Simple
Decoder
Simple
Decoder
Simple
Decoder
Complex
Decoder
1μop
1μop
1μop
4μops
7 Entry μop Buffer
4μops
Register Alias Table
and Allocator
96 Entry Reorder Buffer (ROB) Retirement Register File
4μops
4μops
Store
Data
Store
Address
SSE
ALU
ALU
Branch
SSE
Shuffle
MUL
ALU
SSE
Shuffle
ALU
ALU
Load
Address
4μops
32 Entry Reservation Station
128 Bit
FMUL
FDIV
128 Bit
FADD
Memory Ordering Buffer
(MOB)
128 Bit
32 KB Dual Ported Data Cache
(8 way)
16 entry
DTLB
Store
128 Bit
Load
256 entry
L2 DTLB
Shared
L2 Cache
(16 way)
Shared
Bus
Interface
Unit
256 Bit

View Slide

18
Why All This Hardware Complexity?

View Slide

19
Laws of Physics: Atoms Are Too Big!!!
Each spot is an atom. Qingxiao Wang/UT Dallas ca. 2016.

View Slide

20
Laws of Physics: Atoms Are Too Big!!!
Each spot is an atom. Qingxiao Wang/UT Dallas ca. 2016.
Speed controlled by base thickness:
At least one atom thick!!!

View Slide

21
Laws of Physics: Light Is Too Slow!!!
“One nanosecond per foot” courtesy of Grace Hopper (https://www.youtube.com/watch?v=9eyFDBPk4Yw)
https://en.wikipedia.org/wiki/List_of_refractive_indices A 50% sugar solution is “light syrup”.
●
Following the footsteps of Admiral Hopper:
– Light goes 11.803 inches/ns in a vacuum
●
Or, if you prefer, 1.0097 lengths of A4 paper per nanosecond
●
Light goes 1 width of A4 paper per nanosecond in 50% sugar solution
– But over and back: 5.9015 in/ns
– But not 1GHz! Instead, ~2GHz: ~3in/ns
– But Cu: ~1 in/ns, or Si transistors: ~0.1 in/ns
– Plus other slowdowns: prototols, electronics, ...

View Slide

22
Laws of Physics: Data Is Slower!!!
CPUs Caches Interconnects
Memories
DRAM & NVM

View Slide

23
Memories
DRAM & NVM
Light is way too slow in Cu and Si and atoms are way too big!!!

View Slide

24
Memories
DRAM & NVM
Protocol
overheads
(Mathematics!)
Multiplexing &
Demultiplexing
(Electronics)
Clock-domain
transitions
(Electronics)
Phase
changes
(Chemistry)

View Slide

25
Laws of Physics: Summary
●
The speed of light is finite (especially in Cu and
Si) and atoms are of non-zero size
●
Mathematics, electronics, and chemistry also
take their toll
●
Systems are fast, so this matters

View Slide

26
Laws of Physics: Summary
●
The speed of light is finite (especially in Cu and
Si) and atoms are of non-zero size
●
Mathematics, electronics, and chemistry also
take their toll
●
Systems are fast, so this matters
“Gentlemen, you have two fundamental problems:
(1) the finite speed of light and (2) the atomic nature of matter.” *
* Gordon Moore quoting Stephen Hawking

View Slide

27
Why All This Hardware Complexity?
Memories
DRAM & NVM
Protocol
overheads
(Mathematics!)
Multiplexing &
Demultiplexing
(Electronics)
Clock-domain
transitions
(Electronics)
Phase
changes
(Chemistry)
Slow light and big atoms create modern computing obstacle course!!!

View Slide

28
Account For All CPU Complexity???
●
Sometimes, yes! (Assembly language!)
●
But we also need portability: CPUs change
– From family to family
– With each revision of silicon
– To work around hardware bugs
– As a given physical CPU ages

View Slide

29
One of the ALUs Might Be Disabled
Intel Core 2 Architecture (Wikipedia user “I, Appaloosa” CC BY-SA 3.0, reformatted)
128 entry
ITLB
(8 way)
32 Byte Pre-Decode,
Fetch Buffer
18 Entry
Instruction Queue
128 Bit
6 Instructions
Instruction
Fetch Unit
Micro-
code
Simple
Decoder
Simple
Decoder
Simple
Decoder
Complex
Decoder
1μop
1μop
1μop
4μops
7 Entry μop Buffer
4μops
and Allocator
4μops
4μops
Store
Data
Store
Address
SSE
ALU
ALU
Branch
SSE
Shuffle
MUL
ALU
SSE
Shuffle
ALU
ALU
Load
Address
4μops
128 Bit
FMUL
FDIV
128 Bit
FADD
(MOB)
128 Bit
(8 way)
16 entry
DTLB
Store
128 Bit
Load
256 entry
L2 DTLB
Shared
L2 Cache
(16 way)
Shared
Bus
Interface
Unit
256 Bit
X
X

View Slide

30
Thus, Simple Portable CPU Model
128 entry
ITLB
(8 way)
32 Byte Pre-Decode,
Fetch Buffer
18 Entry
Instruction Queue
128 Bit
6 Instructions
Instruction
Fetch Unit
Micro-
code
Simple
Decoder
Simple
Decoder
Simple
Decoder
Complex
Decoder
1μop
1μop
1μop
4μops
7 Entry μop Buffer
4μops
and Allocator
4μops
4μops
Store
Data
Store
Address
SSE
ALU
ALU
Branch
SSE
Shuffle
MUL
ALU
SSE
Shuffle
ALU
ALU
Load
Address
4μops
128 Bit
FMUL
FDIV
128 Bit
FADD
(MOB)
128 Bit
(8 way)
16 entry
DTLB
Store
128 Bit
Load
256 entry
L2 DTLB
Shared
L2 Cache
(16 way)
Shared
Bus
Interface
Unit
256 Bit
CPU
CPU
Store
Store
Buffer
Buffer
Cache
Cache
Intel Core 2 Architecture (Wikipedia user “I, Appaloosa” CC BY-SA 3.0, reformatted and remixed)

View Slide

31
And Lots Of CPUs Per System!!!
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Interconnect
Interconnect
CPUs 0-27 &
CPUs 224-251
CPUs 28-55 &
CPUs 252-279
CPUs 56-83 &
CPUs 280-307
CPUs 84-111 &
CPUs 308-335
CPUs 112-139 &
CPUs 336-363
CPUs 140-167 &
CPUs 364-391
CPUs 168-195 &
CPUs 392-419
CPUs 196-223 &
CPUs 420-447

View Slide

32
Obstacles for Modern Computers

View Slide

33
Obstacle: Pipeline Flush
PIPELINE
ERROR
PIPELINE
ERROR
BRANCH
MISPREDICTION
Running at full speed requires perfect branch prediction

View Slide

34
Obstacle: Memory Reference
A single fetch all the way from memory can cost hundreds of clock cycles

View Slide

35
Obstacle: Atomic Operation
Atomic operations require locking cachelines and/or busses, incurring significant delays

View Slide

36
Obstacle: Memory Barrier
Memory barriers result in stalls and/or ordering constraints, again incurring delays
Memory
Barrier
Memory
Barrier

View Slide

37
Obstacle: Thermal Throttling
Efficient use of CPU hardware generates heat, throttling the CPU clock frequency

View Slide

38
Obstacle: Cache Miss
Cache misses result in waiting for data to arrive (from memory or other CPUs)
CACHE-
MISS
TOLL
BOOTH
CACHE-
MISS
TOLL
BOOTH

View Slide

39
Obstacle: Input/Output Operation
And here you thought that cache misses were slow...

View Slide

40
Which Obstables To Focus On?
1) I/O operations (but often higher-level issue)
2) Communications cache misses
3) Memory barriers and atomic operations
4) Capacity/geometry cache misses (memory)
5) Branch prediction

View Slide

41
Which Obstables To Focus On?
1) I/O operations (but often higher-level issue)
2) Communications cache misses
3) Memory barriers and atomic operations
4) Capacity/geometry cache misses (memory)
5) Branch prediction
These obstacles can (usually) be overcome in a portable manner.

View Slide

42
Xeon Platinum 8176 2.1GHz: CPU 0

View Slide

43
Location Really Matters!!!
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Last-Level Cache
Interconnect
Interconnect
CPUs 0-27 &
CPUs 224-251
CPUs 28-55 &
CPUs 252-279
CPUs 56-83 &
CPUs 280-307
CPUs 84-111 &
CPUs 308-335
CPUs 112-139 &
CPUs 336-363
CPUs 140-167 &
CPUs 364-391
CPUs 168-195 &
CPUs 392-419
CPUs 196-223 &
CPUs 420-447

View Slide

44
Latency Demonstration

View Slide

49
Can Hardware Help???

View Slide

50
Can Hardware Help???
Source
Drain
Sub-Atomic Base
Vacuum-gap transistor: At these scales, the atmosphere is a vacuum!!!

View Slide

51
We Really Can Hand-Place Atoms...
Actually a carbon monoxide molecule that I moved across a few planes of copper

View Slide

52
We Really Can Hand-Place Atoms...
But not trillions of them in a cost-effective manner!!!
Does
Not Scale

View Slide

53
Incremental Help From Hardware

View Slide

54
Hardware 3D Integration
3 cm
1.5 cm
Half the distance,
twice the speed!!!
Both stacked chiplets and lithographically stacked transistors.

View Slide

55
Stacked Chiplets/Dies
Diagram by Shmuel Csaba Otto Traian, CCSA4.0

View Slide

56
Lithographically Stacked Transistors
https://ieeexplore.ieee.org/document/9976473
https://spectrum.ieee.org/intels-stacked-nanosheet-transistors-could-be-the-next-step-in-moores-law

View Slide

57
Hardware 3D Integration
* Give or take issues with power, cooling, alignment, interconnect drivers, and so on.
3 cm
1.5 cm
Half the distance,
twice the speed *

View Slide

58
Hardware Integration is Helping
Q3 2017: 56 CPUs with ~100ns latencies

View Slide

59
Hardware Integration is Helping
November 2008: 16 CPUs with ~100ns latencies: More than 3x in nine years!!!

View Slide

60
Hardware Accelerators

View Slide

61
Hardware Accelerators, Theory
Data Accelerator
Unidirectional data flow, no out and back, twice the speed!!!

View Slide

62
Hardware Accelerators, Practice
Data Accelerator
Sadly, back to request-response, but better latency with local memory?
Accelerator-local
memory
System main memory

View Slide

63
So Why Hardware Accelerators???
●
Optimized data transfers (e.g., larger blocks)
●
Optimized hard-wired computation
●
Better performance per watt
●
Better performance per unit capital cost

View Slide

64
Hardware Has Been Helping All Along

View Slide

65
What Hardware Is Up Against
Memories
DRAM & NVM
Protocol
overheads
(Mathematics!)
Multiplexing &
Demultiplexing
(Electronics)
Clock-domain
transitions
(Electronics)
Phase
changes
(Chemistry)

View Slide

66
Therefore, Memory Hierarchies!!!

View Slide

67
Simple Portable CPU Model Redux
128 entry
ITLB
(8 way)
32 Byte Pre-Decode,
Fetch Buffer
18 Entry
Instruction Queue
128 Bit
6 Instructions
Instruction
Fetch Unit
Micro-
code
Simple
Decoder
Simple
Decoder
Simple
Decoder
Complex
Decoder
1μop
1μop
1μop
4μops
7 Entry μop Buffer
4μops
and Allocator
4μops
4μops
Store
Data
Store
Address
SSE
ALU
ALU
Branch
SSE
Shuffle
MUL
ALU
SSE
Shuffle
ALU
ALU
Load
Address
4μops
128 Bit
FMUL
FDIV
128 Bit
FADD
(MOB)
128 Bit
(8 way)
16 entry
DTLB
Store
128 Bit
Load
256 entry
L2 DTLB
Shared
L2 Cache
(16 way)
Shared
Bus
Interface
Unit
256 Bit
CPU
CPU
Store
Store
Buffer
Buffer
Cache
Cache
(Not yet)

View Slide

68
Read-Side Hardware Help (1/7)
CPU 0
Cache
CPU 3
Cache
x=42,y=63
CPU 1 CPU 2
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)
Request cacheline x

View Slide

69
CPU 0
Cache
CPU 3
Cache
x=42,y=63
CPU 1 CPU 2
Request cacheline x
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

70
CPU 0
Cache
CPU 3
Cache
x=42,y=63
CPU 1 CPU 2
Request cacheline x
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

71
CPU 0
Cache
CPU 3
Cache
CPU 1 CPU 2
Cacheline x = 42, y = 63
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

72
CPU 0
Cache
x=42,y=63
CPU 3
Cache
CPU 1 CPU 2
Cacheline x = 42, y = 63
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

73
CPU 0
Cache
x=42,y=63
CPU 3
Cache
CPU 1 CPU 2
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

74
CPU 0
Cache
x=42,y=63
CPU 3
Cache
CPU 1 CPU 2
Caches help beat laws of physics given spatial locality!!!
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

75
CPU 0
Cache
x=42,y=63
CPU 3
Cache
CPU 1 CPU 2
Caches help beat laws of physics given temporal locality!!!
r1 = READ_ONCE(x)
r2 = READ_ONCE(y)
r3 = READ_ONCE(x)

View Slide

76
Levels of Cache on My Laptop
Index Line Size Associativity Size
0 64 8 32K
1 64 8 32K
2 64 4 256K
3 64 16 16,384K
When taking on the laws of physics, don’t be afraid to use a few transistors

View Slide

77
Levels of Cache on Large Old Server
Index Line Size Associativity Size
0 64 8 32K
1 64 6 32K
2 64 16 1,024K
3 64 11 39,424K
When taking on the laws of physics, don’t be afraid to use lots of transistors

View Slide

79
But What About Writes?

View Slide

80
Write-Side Hardware Help (Summary)
●
Store buffers for the win!!! Sort of…
– Cache line for variable x is initially at CPU 3
– CPU 0 writes 1 to x, but doesn't have cacheline
●
So holds the write in CPU 0's store buffer
●
And requests exclusive access to the cacheline (which takes time)
– CPU 3 reads x, obtaining “0” immediately from cacheline
– CPU 0 receive's x's cacheline
●
And CPU 0's write finally gets to the cacheline
●
Overwriting the value that CPU 3 read, despite the write starting earlier
●
Writes only appear to be instantaneous!!!

View Slide

81
Simple Portable CPU Model Redux
128 entry
ITLB
(8 way)
32 Byte Pre-Decode,
Fetch Buffer
18 Entry
Instruction Queue
128 Bit
6 Instructions
Instruction
Fetch Unit
Micro-
code
Simple
Decoder
Simple
Decoder
Simple
Decoder
Complex
Decoder
1μop
1μop
1μop
4μops
7 Entry μop Buffer
4μops
and Allocator
4μops
4μops
Store
Data
Store
Address
SSE
ALU
ALU
Branch
SSE
Shuffle
MUL
ALU
SSE
Shuffle
ALU
ALU
Load
Address
4μops
128 Bit
FMUL
FDIV
128 Bit
FADD
(MOB)
128 Bit
(8 way)
16 entry
DTLB
Store
128 Bit
Load
256 entry
L2 DTLB
Shared
L2 Cache
(16 way)
Shared
Bus
Interface
Unit
256 Bit
CPU
CPU
Store
Store
Buffer
Buffer
Cache
Cache
Now!!!

View Slide

82
Write-Side Hardware Help (1/7)
CPU 0
Store Buffer
Cache
CPU 3
Store Buffer
Cache
x=0
CPU 1 CPU 2
WRITE_ONCE(x, 1) READ_ONCE(x)

View Slide

83
CPU 0
Store Buffer
x=1
Cache
CPU 3
Store Buffer
Cache
x=0
CPU 1 CPU 2
Request cacheline x
The store buffer allows writes to completes quickly!!! Take that, laws of physics!!!

View Slide

84
CPU 0
Store Buffer
x=1
Cache
CPU 3
Store Buffer
Cache
x=0
CPU 1 CPU 2
Request cacheline x
Except that later read gets older value...

View Slide

85
CPU 0
Store Buffer
x=1
Cache
CPU 3
Store Buffer
Cache
x=0
CPU 1 CPU 2
Request cacheline x

View Slide

86
CPU 0
Store Buffer
x=1
Cache
CPU 3
Store Buffer
Cache
CPU 1 CPU 2
Respond with cacheline x = 0

View Slide

87
CPU 0
Store Buffer
x=1
Cache
x=0
CPU 3
Store Buffer
Cache
CPU 1 CPU 2
Respond with cacheline x = 0

View Slide

88
CPU 0
Store Buffer
Cache
x=1
CPU 3
Store Buffer
Cache
CPU 1 CPU 2
Quick write completion, sort of. Laws of physics: Slow or misordered!!!

View Slide

89
Misordering? Or Propagation Delay?
CPU 0
CPU 1
CPU 2
CPU 3
WRITE_ONCE(x, 1);
r1 = READ_ONCE(x) == 0;
X
==
0 X
==
1
fr
Time

View Slide

90
And Careful What You Wish For!!!
CPU 0
CPU 1
CPU 2
CPU 3
WRITE_ONCE(x, 1);
r1 = READ_ONCE(x) == 0;
X
==
0 X
==
1
fr
Time
Hardware tricks help reduce the red triangle. But too bad about Meltdown and Spectre...

View Slide

91
Can Software Help?

View Slide

92
Can Software Help?
●
Increasingly, yes!!!
●
Use concurrent libraries, applications,
subsystems, and so on
– Let a few do the careful coding and tuning
– Let a great many benefit from the work of a few
●
Use proper APIs to deal with memory ordering
– Chapter 14: “Advanced Synchronization: Memory Ordering”
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

View Slide

93
Summary

View Slide

94
Summary
●
Modern hardware is highly optimized
– Most of the time!
– Incremental improvements due to integration
– But the speed of light is too slow and atoms too big
●
Use concurrent software where available
●
Structure your code to avoid the big obstacles
– Micro-optimizations only sometimes needed

View Slide

95
For More Information
●
“Computer Architecture: A Quantitative Approach”, Hennessey & Patterson (Recent HW)
●
“Parallel Computer Architecture: A Hardware/Software Approach”, Culler et al.
– Includes SGI Origin and Sequent NUMA-Q
●
“Programing Massively Parallel Processors: A Hands-on Approach”, Kirk & Hwu
– Primarily NVIDIA GPGPUs
●
https://developer.nvidia.com/educators/existing-courses
– List of NVIDIA university courseware
●
https://gpuopen.com/professional-compute
– List of AMD GPGPU-related content
●
“Is Parallel Programming Hard, And, If So, What Can You Do About It?”
– https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

View Slide

96
L'antidote de Codage Simultané de la Femme de Paul
●
Cordial de mûres sauvages de l'Himalaya
– Mettre deux litres de mûres dans un pot de quatre litres
– Ajouter cinq huitièmes litres de sucre
– Remplissez le pot de vodka
– Secouez tous les jours pendant cinq jours
– Secouez chaque semaine pendant cinq semaines
– Passer au tamis: Ajoutez des baies à la glace, consommez le
liquide filtré comme vous voulez

View Slide

97
Paul’s Wife’s Concurrency Antidote
●
Wild Himalayan Blackberry Cordial
– Put 8 cups wild himalayan blackberries in 1 gallon jar
– Add 2½ cups sugar
– Fill jar with vodka
– Shake every day for five days
– Shake every week for five weeks
– Pour through sieve: Add berries to ice cream, consume filtered
liquid as you wish

View Slide

98
Paul’s Wife’s Concurrency Antidote
●
Wild Himalayan Blackberry Cordial
– 8 cups wild himalayan blackberries in 1 gallon jar
– Add 2½ cups sugar
– Fill jar with vodka
– Shake every day for five days
– Shake every week for five weeks
– Pour through sieve: Add berries to ice cream, consume filtered
liquid as you wish
If there is no right tool, invent it!!!
Questions?

View Slide

Hardware and its Concurrency Habits

Hardware and its Concurrency Habits

Kernel Recipes
PRO

More Decks by Kernel Recipes

Other Decks in Programming

Featured

Transcript

Hardware and its Concurrency Habits

Hardware and its Concurrency Habits

Kernel Recipes PRO

More Decks by Kernel Recipes

Other Decks in Programming

Featured

Transcript

Kernel Recipes
PRO