Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast by Friday: Why Kernel Superpowers are Essential

Fast by Friday: Why Kernel Superpowers are Essential

It is not ok that we speed weeks, even months, trying to solve why software is slow. Companies waste money on compute costs, users are unhappy with latency, and product evaluations run out of investigation time. It should not take more than a week to identify the root cause or causes for a performance issue, such that any performance issue reported on a Monday should be solved by Friday, or sooner. The kernel superpowers we have been building are essential for this dream, and allow us to explore performance analysis methodologies to achieve this that were previously a fantasy.

This talk explores the dream of “fast by Friday,” and shows how kernel technologies like eBPF, and performance methodologies, can get us there. The end goal is not more tools and metrics or having everyone learn eBPF bytecode. It’s about efficient computing, and solving inefficiencies as quickly as possible. It’s about saving cycles and carbon.

To be fast by Friday requires observability tools to work on Monday, and right now for many Linux environments that means /proc based tools and Ftrace, sometimes perf, and rarely the eBPF tracing tools: bcc and bpftrace. This and other current and future technical challenges will be discussed, including eBPF stack walking, runtime behavior and uprobes, compiler optimization defaults, OS default packages, and non-CPU targets (GPUs, accelerators).

Brendan GREGG

Kernel Recipes
PRO

September 29, 2023
Tweet

More Decks by Kernel Recipes

Other Decks in Programming

Transcript

  1. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Fast by Friday
    Brendan Gregg
    Why Kernel Superpowers are Essential

    View Slide

  2. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 2
    What would it take
    to solve any computer performance issue
    in 5 days?

    View Slide

  3. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 3
    Imagine solving the performance of anything
    Operating systems, kernels, web browsers, phones, applications,
    websites, microservices, processors, AI, etc., …
    Examples: Linux, Windows, Firefox, Google docs, Minecraft,
    Amazon.com, Intel GPUs, pytorch, etc., …
    Websites should load in the blink of an eye.

    View Slide

  4. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 4
    Timely performance analysis allows faster and more efficient
    software/hardware/tuning options to be adopted
    Good for the environment: Less cycles, energy, carbon
    Good for innovation: Rewards investment in engineering
    Good for companies: Less compute expense
    Good for end-users: Lower latency, cheaper products
    Why

    View Slide

  5. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 5
    "Fast by Friday":
    Any computer performance issue
    reported on Monday
    should be solved by Friday
    (or sooner)
    A vision:

    View Slide

  6. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 6
    "Fast by Friday":
    Any computer performance issue
    reported on Monday
    should be solved by Friday
    (or sooner)
    Issues: any performance analysis task, especially SW/HW evaluations
    Solved by friday: doesn't mean fixed, it means root cause(s) known
    Definitions

    View Slide

  7. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 7
    A vision
    A way of thinking
    A call to action
    A methodology
    A practical deadline
    I want to completely understand the performance of everything…in 5 days
    "Fast by Friday" is…

    View Slide

  8. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 8
    1. Found Performance root cause(s) known
    2. Fixed Fix developed
    3. Deployed Fixed everywhere
    "Fast by Friday" focuses on (1) as it's often the biggest obstacle.
    Yes, even for the Linux kernel. Show me a 2x perf fix and I'll show you comparies running it by Friday.
    If the wasted cores paper was widely applicable, I'd have a pretty good example.
    The first of three activities

    View Slide

  9. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 9
    The Problem

    View Slide

  10. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 10
    Expected performance improvement for computing products
    Product Performance: Hypothetical
    Performance

    View Slide

  11. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 11
    Example reality
    Product Performance: Actual
    Performance

    View Slide

  12. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 12
    Example reality: 3 issues
    Bottleneck not
    found in time
    Not enough time
    to properly analyze
    all new software/
    hardware/compiler
    options (e.g., icx!)
    Regression not
    solved in time
    We, engineers, have to fix this!
    Product Performance: Actual
    Performance
    Amount
    of lost
    performance

    View Slide

  13. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 13
    Problem: Computers are getting increasingly complex
    Just one example (computer hardware) of increasing complexity.
    Software is worse!
    Performance issues can now go unsolved for weeks, months, years
    Product decisions miss improvements as analysis and tuning takes too long

    View Slide

  14. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023 14
    Analogy: Car performance
    You build the world's fastest car, but the customer says: "it isn't"
    You investigate and discover:
    They were sent the wrong car
    … with flat tires
    … unbalanced wheels
    … a minor engine issue
    … and older firmware
    This may take too long to debug and the customer may leave.
    Computers are like this too!
    They also weren't told how to drive it
    … and left economy enabled
    … and didn't use the turbo button

    View Slide

  15. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    A common scenario at product vendors
    Your product is probably the fastest
    But there's likely some config/tunable error
    It's the final week of the customer eval
    You have to make it fast by friday
    15

    View Slide

  16. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    How
    16

    View Slide

  17. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: Preparation
    Monday: Quantify, static tuning, load
    Tuesday: Checklists, elimination
    Wednesday: Profiling
    Thursday: Latency, logs, critical path
    Friday: Efficiency, algorithms
    Post weeks: Case study, retrospective
    "Fast by Friday": Proposed Agenda
    17

    View Slide

  18. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: Preparation
    Everything must work on Monday!
    ❏ Critical analysis tools ("crisis tools") must be
    preinstalled; E.g., Linux: procps, sysstat,
    linux-tools-common, bcc-tools, bpftrace, …
    ❏ Stack tracing and symbols should work for the
    kernel, libraries, and applications
    ❏ Tracing (host & distributed) must work
    ❏ The performance engineers must already have host
    SSH root access
    ❏ A functional diagram of the system must be known
    ❏ Source code should be available
    Example functional diagram
    Source: Lunar Module - LM10 Through LM14 Familiarization Manual" (1969):
    Current industry status: 1 out of 5
    18

    View Slide

  19. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: "Crisis Tools"
    Source: Systems Performance 2nd Edition, page 131-132
    No time to "apt-get update; apt-get
    install…" during a perf crisis.
    Ftrace is great as it's usually there;
    my Ftrace/perf tools:
    19
    https://github.com/brendangregg/perf-tools

    View Slide

  20. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Monday: Quantify, static tuning, load
    1. Quantify the problem
    ○ Problem statement method
    2. Static performance tuning
    ○ The system without load
    ○ Check all hardware, software
    versions, past errors, config
    ○ Covered in sysperf
    3. Load vs implementation
    ○ Just a problem of load?
    ○ Usually solved via basic monitoring
    and line charts
    Current industry status: 4 out of 5
    Problem Statement method
    Source: Systems Performance 2nd edition, page 44
    A familiar pattern of load
    Source: https://www.brendangregg.com/Slides/SREcon_2016_perf_checklists
    20

    View Slide

  21. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Monday (cont.): End-of-day Status
    If still unsolved, we now know:
    - It’s a real issue, of this magnitude, affecting these systems
    - It’s not just config
    - It’s not just load
    21

    View Slide

  22. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday: Checklists, elimination
    1. Recent issue checklist
    ○ Often need new tools for ad hoc checks
    ○ Can now be automated by AI auto-tuners
    (e.g., Intel Granulate)
    2. Elimination: Subsystems it isn't
    ○ It's impossible to deep-dive everything in
    one week, need to narrow down
    ○ New tools to exonerate components
    ○ Dashboards of health check traffic lights
    ○ Include experiments: microbenchmarks
    Current industry status: 2 out of 5
    Generic system diagram
    22

    View Slide

  23. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    We need new tools for broad and deep custom performance
    analysis, ideally that can be developed and run in-situ by
    Friday. No restarts.
    eBPF is a kernel superpower that makes this possible.
    (e.g., show me how much workload A queued behind workload B: This is not just queue latency
    histograms, but needs programmatic filters.)
    Ftrace/perf/perf+eBPF also have kernel superpowers in the
    hands of wizards.
    New observability tools often need kernel superpowers
    2
    eBPF
    Ftrace
    perf

    View Slide

  24. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday (cont.): eBPF Tools
    Current eBPF tools
    *snoop, *top, *stat, *count, *slower, *dist
    Supports later methodologies
    Workload characterization, latency analysis, off-CPU
    analysis, USE method, etc.
    Future elimination tools
    *health, *diagnosis
    Supports "fast by friday"
    Analyzes existing dynamic workload
    Open source & in the target code repo
    E.g., Linux subsystem tools should be in Linux, like unit
    tests, accepted by maintainers, and ideally written by
    the developers! E.g., dctcphealth should ideally be
    written by the dctcp author: Daniel Borkmann!
    This ensures they are accurate and maintained.
    They should not be in bcc/bpftrace or proprietary.
    Current eBPF performance tools
    Source: BPF Performance Tools, cover art [Gregg 2019]
    24

    View Slide

  25. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday (cont.): Health Tool Example 1/2
    I wrote the ZFS L2ARC (second level cache) so I should write the
    health check tool, or at least share thoughts for others to follow:
    - I designed it to either help or do nothing, so shouldn’t be an issue, but... It could burn CPU for
    scanning, memory for metadata, and disk I/O throughput for caching, and not providing a net
    win, especially if someone set the record size to very small. Plus there could be outright bugs
    by new: There was that ARC bug I talked about at the last KR.
    - Experimental is easiest: It’s a cache, so turn it off! Are things now faster or slower?
    - Accurate observability is hard: Measure CPU burn (profiling or eBPF tracing), disk I/O, and
    impact of L2ARC kernel metadata preventing app WSS from caching, but measuring WSS is
    hard, and my website is overdue an update www.brendangregg.com/wss.html
    - Rough observability: From kernel counters: Is the L2ARC in use? Is the recsize <32k? Is it
    constantly scanning (CPU)? Is there heavy disk I/O (contention)? Then “maybe”.
    - I have more thoughts and this should become a bcc tool request ticket. When it’s your own
    code, you know a lot of “however”s!
    25

    View Slide

  26. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday (cont.): Health Tool Example 2/2
    I wrote the ZFS L2ARC (second level cache) so I should write the
    health check tool, or at least share thoughts for others to follow:
    - I designed it to either help or do nothing, so shouldn’t be an issue, but... It could burn CPU for
    scanning, memory for metadata, and disk I/O throughput for caching, and not providing a net
    win, especially if someone set the record size to very small. Plus there could be outright bugs
    by new: There was that ARC bug I talked about at the last KR.
    - Experimental is easiest: It’s a cache, so turn it off! Are things now faster or slower?
    - Accurate observability is hard: Measure CPU burn (profiling or eBPF tracing), disk I/O, and
    impact of L2ARC kernel metadata preventing app WSS from caching, but measuring WSS is
    hard, and my website is overdue an update www.brendangregg.com/wss.html
    - Rough observability: From kernel counters: Is the L2ARC in use? Is the recsize <32k? Is it
    constantly scanning (CPU)? Is there heavy disk I/O (contention)? Then “maybe”.
    - I have more thoughts and this should become a bcc tool request ticket. When it’s your own
    code, you know a lot of “however”s!
    26
    In summary, a practical L2ARC health tool could:
    1. Use kernel counters to check for possible resource contention
    versus handpicked thresholds, and report “good” or “maybe issue”.
    2. If maybe, prompt for an invasive test that disables the L2ARC while
    monitoring systemic throughput. Report “good” or “bad” and quantify.
    If needed can measure contention via kprobe/kfunc tracing and eBPF.
    The tool should be in ZFS and its logic and thresholds maintained.

    View Slide

  27. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday (cont.): Health Tool Points
    A. An ugly half-good tool is better than nothing
    B. Sharing thoughts can let others write it (Documentation/*/health.txt)
    C. Reporting "maybe" is ok
    D. Not an C64 diagnostics cart: Has to analyze exsiting workloads
    E. Test hierarchy: safe -> violent, only progress if needed, can prompt
    F. Be pragmatic: eBPF, perf, Ftrace, /proc, use anything
    Current tools: "Here's data, you figure it out"
    Health tools: "I figured it out"
    27

    View Slide

  28. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Tuesday (cont.): End-of-day Status
    If still unsolved, we now know:
    - It’s a real issue, of this magnitude, affecting these systems
    - It’s not just config
    - It’s not just load
    - It’s not a recent issue
    - It’s caused by these components
    28

    View Slide

  29. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Wednesday: Profiling
    1. CPU Flame Graphs
    ○ More efficient with eBPF
    ○ eBPF runtime stack walkers
    2. CPI Flame Graphs
    ○ Needs PMCs PEBS on Intel for accuracy
    3. Off-CPU Flame Graphs
    ○ Impractical without eBPF
    Solves most performance issues
    Needs preparation!
    Current industry status: 3 out of 5
    CPU flame graph
    Off-CPU/waker time flame graph
    29

    View Slide

  30. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Wednesday (cont.): End-of-day Status
    If still unsolved, we now know:
    - It’s a real issue, of this magnitude, affecting these systems
    - It’s not just config
    - It’s not just load
    - It’s not a recent issue
    - It’s caused by these components
    - It’s caused by these codepaths
    30

    View Slide

  31. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Thursday: Latency, logs, critical path, HW
    1. Latency drilldowns
    ○ Latency histograms
    ○ Latency heat maps
    ○ Latency outliers
    2. Logs, event tracing
    ○ Custom event logs
    3. Critical path analysis
    ○ Multi-threaded tracing
    ○ Distributed tracing across a distributed
    environment
    4. Hardware counters
    Distributed tracing
    Source: https://www.brendangregg.com/Slides/Monitorama2015_NetflixInstanceAnalysis
    Latency heat maps
    Source: https://www.brendangregg.com/HeatMaps/latency.html
    Current industry status: 3 out of 5
    31

    View Slide

  32. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Thursday: Latency, logs, critical path, HW
    1. Latency drilldowns
    ○ Latency histograms
    ○ Latency heat maps
    ○ Latency outliers
    2. Logs, event tracing
    ○ Custom event logs
    3. Critical path analysis
    ○ Multi-threaded tracing
    ○ Distributed tracing across a distributed
    environment
    4. Hardware counters
    Distributed tracing
    Source: https://www.brendangregg.com/Slides/Monitorama2015_NetflixInstanceAnalysis
    Latency heat maps
    Source: https://www.brendangregg.com/HeatMaps/latency.html
    eBPF Tools
    *dist
    *slower
    *snoop, bpftrace
    "Zero instrumentation"
    (when faster uprobes is done;
    currently: https://dont-ship.it)
    Current industry status: 3 out of 5
    32
    perf & its
    subcommands

    View Slide

  33. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Thursday (cont.): End-of-day Status
    If still unsolved, we now know:
    - It’s a real issue, of this magnitude, affecting these systems
    - It’s not just config
    - It’s not just load
    - It’s not a recent issue
    - It’s caused by these components
    - It’s caused by these codepaths
    - Latency has this distribution, over time, and these outliers
    - Latency is coming from this specific component
    - It's not a low-level hardware issue
    33

    View Slide

  34. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Friday: Efficiency, algorithms
    1. Is the target efficient?
    ○ A largely unsolved problem
    ○ Cycles/carbon per request
    ○ Compare with similar products
    ○ New efficiency tools (eBPF?)
    ○ System efficiency equals the
    least efficient component
    ○ Modeling, theory
    2. Use faster algorithms?
    ○ Big O Notation
    Current industry status: 1 out of 5
    Source: Systems Performance 2nd Edition, page 175
    Protocol CIFS iSCSI FTP NFSv3 NFSv4
    Cycles(k)
    per 1k read
    2241 1843 970 395 485
    Example efficiency comparisons (made up)
    34

    View Slide

  35. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Friday (cont.): End-of-day Status
    If still unsolved, we now know:
    - It’s a real issue, of this magnitude, affecting these systems
    - It’s not just config
    - It’s not just load
    - It’s not a recent issue
    - It’s caused by this component
    - It’s caused by these codepaths
    - Latency has this distribution, over time, and these outliers
    - Latency is coming from this specific component
    - It's not a low-level hardware issue
    - The code is efficient already. There is no “problem”!
    35

    View Slide

  36. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Post weeks: Case study, retrospective
    1. Document as a case study
    ○ JIRA, wiki, gist
    ○ External blog/talk
    Including (redacted) flame graphs is great: You may
    find overlooked perf issues years later from them.
    ○ Repetition?
    Add to Tuesday's "Recent issue checklist"
    2. Retrospective
    ○ How to debug it faster by friday?
    Example blog post: https://www.brendangregg.com/blog
    Current industry status: 1 out of 5
    36

    View Slide

  37. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: Preparation 1
    Monday: Quantify, static tuning, load 4
    Tuesday: Checklists, elimination 2
    Wednesday: Profiling 3
    Thursday: Latency, logs, critical path 3
    Friday: Efficiency, algorithms 1
    Post weeks: Case study, retrospective 1
    "Fast by Friday": My current industry ratings (5 == best)
    We are not currently good at this
    37

    View Slide

  38. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: Preparation
    Monday: Quantify, static tuning, load
    Tuesday: Checklists, elimination
    Wednesday: Profiling
    Thursday: Latency, logs, critical path
    Friday: Efficiency, algorithms
    Post weeks: Case study, retrospective
    "Fast by Friday": Linux Kernel Superpowers
    eBPF
    perf
    Ftrace
    38

    View Slide

  39. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    What Needs to Change
    39

    View Slide

  40. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Consider perf wins that took weeks as room for improvement
    New tracing tools needed: *diagnose, *health
    Crisis tools should be installed by default in enterprise distros
    Stack walking should work by default for everything
    A way of thinking, a call for action
    40

    View Slide

  41. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Frame pointers already enabled at major companies.
    Fedora first distro to offer it?
    Can't we be smarter if needed?
    NOP/__fentry__ style rewrites (Rostedt)? Options with LD/ELF.
    eBPF custom runtime
    stack walkers (Java, etc.)
    Yes, multiple people are
    doing this. They should ship
    as open source with the
    runtime code.
    Stack walking, frame pointers, and eBPF walking
    41
    https://gcc.gnu.org/legacy-ml/gcc-patches/2004-08/msg01033.html
    Reasons FPs were
    disabled in 2004:
    - i386
    - gdb doesn't
    need them
    - gcc vs icc

    View Slide

  42. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Summary
    42

    View Slide

  43. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Prior weeks: Preparation
    Day 1: Quantify, static tuning, load
    Day 2: Checklists, elimination
    Day 3: Profiling
    Day 4: Latency, logs, critical path
    Day 5: Efficiency, algorithms
    Post weeks: Case study, retrospective
    "Fast by Friday" Summary
    Fast by Friday:
    Any computer performance
    issue reported on Monday
    should be solved by Friday
    (or sooner)
    43

    View Slide

  44. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Performance Mantras:
    1. Don't do it
    2. Do it, but don't do it again
    3. Do it less
    4. Do it later
    5. Do it when they're not looking
    6. Do it concurrently
    7. Do it cheaper
    AFAIK these mantras are from Craig Hanson and Pat Crain (I'm still looking for a reference)
    "Fixed by Friday" (a different talk) sample
    Fixed by Friday:
    Any known performance bug
    reported on Monday
    should have a fix by Friday
    (or sooner)
    44

    View Slide

  45. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    "Fast by Friday": Any computer performance issue reported on
    Monday should be solved by Friday (or sooner)
    Kernel superpowers, especially eBPF, are essential for such
    fast in-situ production analysis
    It will take all of us many years: OS changes, kernel support,
    new tools, methodologies. How can you help? One step at a time!
    Take Aways
    45

    View Slide

  46. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Q&A
    46

    View Slide

  47. Fast by Friday: Why Kernel Superpowers are Essential
    Kernel Recipes 2023
    Jesper Dangaard Brouer
    eBPF: Alexei Starovoitov (Meta), Daniel Borkmann (Isovalent), David S. Miller (Red Hat),
    Jakub Kicinski (Meta), Yonghong Song (Meta), Andrii Nakryiko (Meta), Thomas Graf
    (Isovalent), Martin KaFai Lau (Meta), John Fastabend (Isovalent), Quentin Monnet
    (Isovalent), Jesper Dangaard Brouer (Red Hat), Andrey Ignatov (Meta), Stanislav
    Fomichev (Google), Joe Stringer (Isolavent), KP Singh (Google), Dave Thaler
    (Microsoft), Liz Rice (Isovalent), Chris Wright (Red Hat), Linus Torvalds, and many
    more in the BPF community
    Ftrace: Steven Rostedt (Google) and the Ftrace community
    Perf: Arnaldo Carvalho de Melo (Red Hat) and the
    perf community
    Kernel Recipes 10th edition!
    Thanks
    47

    View Slide