Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Adaptive Systems

Building Adaptive Systems

Chris Keathley

May 28, 2020
Tweet

More Decks by Chris Keathley

Other Decks in Programming

Transcript

  1. Chris Keathley / @ChrisKeathley / [email protected]
    Building Adaptive Systems

    View Slide

  2. Server Server

    View Slide

  3. Server Server
    I have a request

    View Slide

  4. Server Server

    View Slide

  5. Server Server

    View Slide

  6. Server Server
    No Problem!

    View Slide

  7. Server Server

    View Slide

  8. Server Server
    Thanks!

    View Slide

  9. Server Server

    View Slide

  10. Server Server
    I have a request

    View Slide

  11. Server Server

    View Slide

  12. Server Server

    View Slide

  13. Server Server
    I’m a little busy

    View Slide

  14. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  15. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  16. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  17. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  18. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  19. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  20. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  21. Server Server
    I’m a little busy
    I have more requests!

    View Slide

  22. Server Server
    I don’t feel so good

    View Slide

  23. Server

    View Slide

  24. Server
    Welp

    View Slide

  25. Server
    Welp

    View Slide

  26. All services have
    objectives

    View Slide

  27. A resilient service should
    be able to withstand a 10x
    traffic spike and continue
    to meet those objectives

    View Slide

  28. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  29. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  30. What causes
    overload?

    View Slide

  31. What causes overload?
    Server
    Queue

    View Slide

  32. What causes overload?
    Server
    Queue
    Processing Time
    Arrival Rate >

    View Slide

  33. Little’s Law
    Elements in the queue = Arrival Rate * Processing Time

    View Slide

  34. Little’s Law
    Server
    1 requests = 10 rps * 100 ms
    100ms

    View Slide

  35. Little’s Law
    Server
    1 requests = 10 rps * 100 ms
    100ms

    View Slide

  36. Little’s Law
    Server
    1 requests = 10 rps * 100 ms
    100ms

    View Slide

  37. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms

    View Slide

  38. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms

    View Slide

  39. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms

    View Slide

  40. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms

    View Slide

  41. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms

    View Slide

  42. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms
    BEAM Processes

    View Slide

  43. Little’s Law
    Server
    2 requests = 10 rps * 200 ms
    200ms
    BEAM Processes
    CPU Pressure

    View Slide

  44. Little’s Law
    Server
    3 requests = 10 rps * 300 ms
    300ms
    BEAM Processes
    CPU Pressure

    View Slide

  45. Little’s Law
    Server
    30 requests = 10 rps * 3000 ms
    3000ms
    BEAM Processes
    CPU Pressure

    View Slide

  46. Little’s Law
    Server
    30 requests = 10 rps * ∞ ms

    BEAM Processes
    CPU Pressure

    View Slide

  47. Little’s Law
    30 requests = 10 rps * ∞ ms

    View Slide

  48. Little’s Law
    ∞ requests = 10 rps * ∞ ms

    View Slide

  49. Little’s Law
    ∞ requests = 10 rps * ∞ ms
    This is bad

    View Slide

  50. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  51. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  52. Overload
    Arrival Rate > Processing Time

    View Slide

  53. Overload
    Arrival Rate > Processing Time
    We need to get these under control

    View Slide

  54. Load Shedding
    Server
    Queue
    Server

    View Slide

  55. Load Shedding
    Server
    Queue
    Server
    Drop requests

    View Slide

  56. Load Shedding
    Server
    Queue
    Server
    Drop requests
    Stop sending

    View Slide

  57. Autoscaling

    View Slide

  58. Autoscaling

    View Slide

  59. Autoscaling
    Server DB
    Server

    View Slide

  60. Autoscaling
    Server DB
    Server
    Requests start queueing

    View Slide

  61. Autoscaling
    Server DB
    Server
    Server

    View Slide

  62. Autoscaling
    Server DB
    Server
    Server
    Now its worse

    View Slide

  63. Autoscaling needs to
    be in response to
    load shedding

    View Slide

  64. Circuit Breakers

    View Slide

  65. Circuit Breakers

    View Slide

  66. Circuit Breakers
    Server Server

    View Slide

  67. Circuit Breakers
    Server Server

    View Slide

  68. Circuit Breakers
    Server Server
    Shut off traffic

    View Slide

  69. Circuit Breakers
    Server Server

    View Slide

  70. Circuit Breakers
    Server Server
    I’m not quite dead yet

    View Slide

  71. Circuit Breakers are
    your last line of
    defense

    View Slide

  72. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  73. Lets Talk About…
    Queues
    Overload Mitigation
    Adaptive Concurrency

    View Slide

  74. We want to allow as
    many requests as we
    can actually handle

    View Slide

  75. View Slide

  76. Adaptive Limits
    Time
    Concurrency

    View Slide

  77. Adaptive Limits
    Actual limit
    Time
    Concurrency

    View Slide

  78. Adaptive Limits
    Actual limit
    Dynamic Discovery
    Time
    Concurrency

    View Slide

  79. Load Shedding
    Server
    Server

    View Slide

  80. Load Shedding
    Server
    Server
    Are we at the limit?

    View Slide

  81. Load Shedding
    Server
    Server
    Am I still healthy?

    View Slide

  82. Load Shedding
    Server
    Server

    View Slide

  83. Load Shedding
    Server
    Server
    Update Limits

    View Slide

  84. Adaptive Limits
    Time
    Concurrency
    Increased latency

    View Slide

  85. Latency
    Successful vs. Failed requests
    Signals for Adjusting Limits

    View Slide

  86. Additive Increase Multiplicative Decrease
    Success state: limit + 1
    Backoff state: limit * 0.95
    Time
    Concurrency

    View Slide

  87. Prior Art/Alternatives
    https://github.com/ferd/pobox/
    https://github.com/fishcakez/sbroker/
    https://github.com/heroku/canal_lock
    https://github.com/jlouis/safetyvalve
    https://github.com/jlouis/fuse

    View Slide

  88. Regulator
    https://github.com/keathley/regulator

    View Slide

  89. Regulator.install(:service, [
    limit: {Regulator.Limit.AIMD, [timeout: 500]}
    ])
    Regulator.ask(:service, fn ->
    {:ok, Finch.request(:get, "https://keathley.io")}
    end)
    Regulator

    View Slide

  90. Conclusion

    View Slide

  91. Queues are
    everywhere

    View Slide

  92. Those queues need
    to be bounded to
    avoid overload

    View Slide

  93. If your system is
    dynamic, your
    solution will also
    need to be dynamic

    View Slide

  94. Go and build
    awesome stuff

    View Slide

  95. Thanks
    Chris Keathley / @ChrisKeathley / [email protected]

    View Slide