Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka, the hard parts

Kafka, the hard parts

This talk tries to summarize a lot of the lessons I've learned building systems on kafka.

Chris Keathley

January 10, 2019
Tweet

More Decks by Chris Keathley

Other Decks in Programming

Transcript

  1. Kafka
    The Hard Parts
    Chris Keathley / @ChrisKeathley / keathley.io

    View Slide

  2. Kafka is great

    View Slide

  3. Kafka is just a log

    View Slide

  4. https://flic.kr/p/9aXr88

    View Slide

  5. https://flic.kr/p/9aXr88
    Kafka

    View Slide

  6. Kafka
    https://flic.kr/p/9aXr88
    (metaphor)

    View Slide

  7. Log aggregation
    Analytics and activity tracking
    Queuing
    ETL
    Messaging
    Stream Processing
    Kafka Uses

    View Slide

  8. Event Sourcing

    View Slide

  9. Log aggregation
    Analytics and activity tracking
    Queuing
    ETL
    Messaging
    Stream Processing
    Kafka Uses

    View Slide

  10. https://flic.kr/p/hrrbVx

    View Slide

  11. https://flic.kr/p/hrrbVx
    (still a metaphor)
    Kafka

    View Slide

  12. Large
    consequences
    for failure

    View Slide

  13. Joke about mr.
    glass

    View Slide

  14. Joke about mr.
    glass

    View Slide

  15. Iteration Is Hard

    View Slide

  16. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  17. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  18. Topic

    View Slide

  19. Topic

    View Slide

  20. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  21. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  22. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  23. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Written to the File system

    View Slide

  24. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  25. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Messages are ordered

    View Slide

  26. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  27. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5

    View Slide

  28. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer

    View Slide

  29. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer

    View Slide

  30. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer

    View Slide

  31. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer

    View Slide

  32. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer

    View Slide

  33. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer
    Consumer

    View Slide

  34. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer
    Consumer

    View Slide

  35. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer
    Consumer

    View Slide

  36. Partition 1
    Partition 2
    Partition 3
    Partition 4
    Partition 5
    Consumer
    Consumer

    View Slide

  37. Topic

    View Slide

  38. Topic
    Topic
    Topic
    Topic
    Broker

    View Slide

  39. Broker Broker
    Broker

    View Slide

  40. View Slide

  41. Replication
    Leader

    View Slide

  42. Clients
    Java Client
    librdkafka

    View Slide

  43. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  44. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  45. Order is important
    User Events

    View Slide

  46. Order is important
    User Events

    View Slide

  47. Order is important
    Follow

    View Slide

  48. Order is important
    Follow

    View Slide

  49. Order is important
    Follow
    Message

    View Slide

  50. Order is important
    Follow
    Message
    Unfollow

    View Slide

  51. Order is important
    Follow
    Message
    Unfollow
    Causal

    View Slide

  52. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  53. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  54. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  55. Order is important
    Follow
    Message
    Unfollow

    View Slide

  56. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  57. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  58. Order is important
    Follow
    Message
    Unfollow
    Consumer

    View Slide

  59. Group records
    based on order

    View Slide

  60. Partitioner
    to_int(hash(key)) % partitions

    View Slide

  61. Partitioner
    to_int(hash(user_id)) % partitions

    View Slide

  62. Follow
    Message
    Unfollow
    Grouping Consumers

    View Slide

  63. Follow
    Message
    Unfollow
    Causal
    Grouping Consumers

    View Slide

  64. Follow
    Message
    Unfollow
    Grouping Consumers
    Follow Processor
    Message Processor

    View Slide

  65. Follow
    Message
    Unfollow
    Grouping Consumers
    Follow Processor
    Message Processor

    View Slide

  66. Follow
    Message
    Unfollow
    Grouping Consumers
    Follow Processor
    Message Processor

    View Slide

  67. Follow
    Message
    Unfollow
    Grouping Consumers
    User event processor

    View Slide

  68. Follow
    Message
    Unfollow
    Grouping Consumers
    User event processor

    View Slide

  69. Follow
    Message
    Unfollow
    Grouping Consumers
    User event processor

    View Slide

  70. User Events
    Create pipelines
    User event processor
    Messages

    View Slide

  71. User Events
    Create pipelines
    User event processor
    Messages
    Consumes

    View Slide

  72. User Events
    Create pipelines
    User event processor
    Messages
    Consumes
    Produces

    View Slide

  73. "Commander: Better Distributed Applications through CQRS and
    Event Sourcing" by Bobby Calderwood
    https://youtu.be/B1-gS0oEtYc

    View Slide

  74. The less dependence
    you can have
    between consumers
    the better

    View Slide

  75. Random
    partitioning is
    best if you can
    avoid ordering

    View Slide

  76. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  77. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  78. Errors have the
    potential to
    wreck your day

    View Slide

  79. Consumer
    Errors

    View Slide

  80. Consumer
    Errors

    View Slide

  81. Consumer
    Errors

    View Slide

  82. Consumer
    Errors

    View Slide

  83. Consumer
    Errors
    Blocking the head of the line

    View Slide

  84. Consumer
    What should we do?
    Errors

    View Slide

  85. Non-Blocking
    vs.
    Blocking

    View Slide

  86. Non-Blocking
    vs.
    Blocking

    View Slide

  87. Non-Blocking Errors
    Consumer
    42
    1337
    “Robert’);drop
    table students;—”

    View Slide

  88. Non-Blocking Errors
    Consumer
    42
    1337
    “Robert’);drop
    table students;—”
    What do we do?

    View Slide

  89. Non-Blocking Errors
    Consumer

    View Slide

  90. Non-Blocking Errors
    Consumer

    View Slide

  91. Non-Blocking Errors
    Consumer
    Error Topic

    View Slide

  92. Non-Blocking Errors
    Consumer

    View Slide

  93. Non-Blocking Errors
    Consumer

    View Slide

  94. Non-Blocking Errors
    Consumer

    View Slide

  95. Non-Blocking
    vs.
    Blocking

    View Slide

  96. Non-Blocking
    vs.
    Blocking

    View Slide

  97. Blocking Errors
    Database
    Consumer

    View Slide

  98. Blocking Errors
    Database
    Consumer
    Process
    messages
    Store Information

    View Slide

  99. Blocking Errors
    Database
    Consumer

    View Slide

  100. Blocking Errors
    Database
    Consumer

    View Slide

  101. Blocking Errors
    Database
    Consumer
    What do we do?

    View Slide

  102. Blocking Errors
    Database
    Consumer
    Retry

    View Slide

  103. Blocking Errors
    Database
    Consumer
    Send alerts

    View Slide

  104. Skip non-blocking errors
    &
    Retry blocking errors

    View Slide

  105. Design errors
    out of
    existence

    View Slide

  106. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  107. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  108. Delivery
    Guarantees

    View Slide

  109. Computer A
    Communication is hard
    Computer B
    What time
    is it?

    View Slide

  110. Computer A
    Communication is hard
    Computer B

    View Slide

  111. Computer A
    Communication is hard
    Computer B
    Did you get
    it?

    View Slide

  112. Computer A
    Communication is hard
    Computer B
    How about
    now?

    View Slide

  113. Computer A
    Communication is hard
    Computer B
    Now?

    View Slide

  114. 0 <= 1 <= n
    Delivery At least once
    At most once
    Impossible-ish

    View Slide

  115. Consumers should
    *ALWAYS* assume
    “At Least Once”

    View Slide

  116. The Joys of Functional
    Programming

    View Slide

  117. View Slide

  118. You

    View Slide

  119. You
    Functional
    Programming

    View Slide

  120. Immutability
    and
    Idempotence

    View Slide

  121. Immutability:
    An immutable object is an object whose state cannot be
    modified after it is created.

    View Slide

  122. Idempotence:
    …the property of certain operations in mathematics
    and computer science whereby they can be applied
    multiple times without changing the result beyond the
    initial application.

    View Slide

  123. Idempotence:
    Execute the same operation more than once but only
    see the effect once.

    View Slide

  124. Idempotent
    Operations

    View Slide

  125. Counting comments
    comment
    comment
    comment
    increment
    1

    View Slide

  126. Counting comments
    comment
    comment
    comment
    increment
    1

    View Slide

  127. Counting comments
    comment
    comment
    comment
    increment
    2

    View Slide

  128. Counting comments
    comment
    comment
    comment
    increment
    2

    View Slide

  129. Counting comments
    comment
    comment
    comment
    increment
    3

    View Slide

  130. Counting comments
    comment
    comment
    comment
    increment
    3 Some Error

    View Slide

  131. Counting comments
    comment
    comment
    comment
    increment
    3

    View Slide

  132. Counting comments
    comment
    comment
    comment
    increment
    3

    View Slide

  133. Counting comments
    comment
    comment
    comment
    increment
    4

    View Slide

  134. Counting comments
    comment
    comment
    comment
    increment
    4

    View Slide

  135. Counting comments
    comment
    comment
    comment
    increment
    5

    View Slide

  136. Counting comments
    comment
    comment
    comment
    increment
    5

    View Slide

  137. Counting comments
    comment
    comment
    comment
    increment
    6

    View Slide

  138. Kafka Record
    {
    data: {},
    type: “comment.created”,
    }

    View Slide

  139. Kafka Record
    {
    data: {},
    type: “comment.created”,
    msg_id: UUIDv4
    }

    View Slide

  140. Kafka Record
    {
    data: {},
    type: “comment.created”,
    msg_id: UUIDv4
    }
    Used for managing idempotence

    View Slide

  141. Counting comments
    comment
    comment
    comment
    increment
    1

    View Slide

  142. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1)

    View Slide

  143. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1)

    View Slide

  144. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1, 2)

    View Slide

  145. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1, 2)

    View Slide

  146. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1, 2, 3)

    View Slide

  147. Counting comments
    comment
    comment
    comment
    Set.add(id)
    id: 1
    id: 2
    id: 3
    (1, 2, 3) Some Error

    View Slide

  148. Counting comments
    comment
    comment
    comment
    id: 1
    id: 2
    id: 3
    Set.add(id)
    (1, 2, 3)

    View Slide

  149. Counting comments
    comment
    comment
    comment
    id: 1
    id: 2
    id: 3
    Set.add(id)
    (1, 2, 3)

    View Slide

  150. Counting comments
    comment
    comment
    comment
    id: 1
    id: 2
    id: 3
    Set.add(id)
    (1, 2, 3)

    View Slide

  151. Counting comments
    (1, 2, 3)

    View Slide

  152. Counting comments
    cardinality(1, 2, 3)

    View Slide

  153. Counting comments
    cardinality(1, 2, 3)
    => 3

    View Slide

  154. Idempotent
    Side-Effects

    View Slide

  155. smtp
    send_email
    Sending Emails
    email
    id: 1
    email
    id: 2
    email
    id: 3

    View Slide

  156. smtp
    send_email
    Sending Emails
    email
    id: 1
    email
    id: 2
    email
    id: 3
    What do we do if this fails?

    View Slide

  157. smtp
    send_email
    Sending Emails
    email
    id: 1
    email
    id: 2
    email
    id: 3
    Send at most once

    View Slide

  158. smtp
    send_email
    Sending Emails
    email
    id: 1

    View Slide

  159. Cache send_email
    Sending Emails
    email
    id: 1
    smtp

    View Slide

  160. Cache send_email
    Sending Emails
    email
    id: 1
    smtp
    id?(1)

    View Slide

  161. Cache send_email
    Sending Emails
    email
    id: 1
    smtp
    id?(1)
    If id exists then skip it

    View Slide

  162. Cache send_email
    Sending Emails
    email
    id: 1
    smtp

    View Slide

  163. Cache send_email
    Sending Emails
    email
    id: 1
    smtp
    add(1)

    View Slide

  164. Cache send_email
    Sending Emails
    email
    id: 1
    smtp

    View Slide

  165. Cache send_email
    Sending Emails
    email
    id: 1
    smtp

    View Slide

  166. Cache send_email
    Sending Emails
    email
    id: 1
    smtp

    View Slide

  167. send_email
    Sending Emails
    email
    id: 1

    View Slide

  168. send_email
    Sending Emails
    email
    id: 1
    If we see this
    message again
    move it to an
    audit topic

    View Slide

  169. send_email
    Sending Emails If we see this
    message again
    move it to an
    audit topic
    email
    id: 1

    View Slide

  170. send_email
    Sending Emails

    View Slide

  171. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  172. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  173. User Events
    Teams
    User event processor
    Messages
    Notifications
    Notifications
    Notification Sender

    View Slide

  174. User Events
    Teams
    User event processor
    Messages
    Notifications
    Notifications
    Notification Sender
    Teams

    View Slide

  175. Data is the
    language of the
    system

    View Slide

  176. {
    msg_id: "8700635f-1802-417e-89e7-595ad3600104",
    type: "comment.created",
    data: {
    user_id: 1234,
    msg: "This is a super fun conference!"
    }
    }
    Data payloads

    View Slide

  177. {
    msg_id: String,
    type: String,
    data: {
    user_id: Integer,
    msg: String
    }
    }
    Data payloads

    View Slide

  178. {
    msg_id: String,
    type: String,
    data: {
    user_id: Integer,
    msg: String
    }
    }
    Data payloads
    None of this tells
    you anything
    useful about your
    data

    View Slide

  179. {
    msg_id: String,
    type: String,
    data: {
    user_id: Integer,
    msg: String
    }
    }
    Data payloads
    What do we do
    when these things
    change?

    View Slide

  180. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads
    What do we do
    when these things
    change?

    View Slide

  181. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads Lets just use
    versions!

    View Slide

  182. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads Lets just use
    versions!
    (spoiler: this isn’t great)

    View Slide

  183. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads

    View Slide

  184. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    },
    meta: {
    version: 2
    }
    }
    Data payloads

    View Slide

  185. Data Versions
    Consumer
    v1
    v1
    v1
    v1
    v2

    View Slide

  186. Data Versions
    Consumer
    v1
    v1
    v1
    v1
    v2
    This consumer needs to
    understand both
    versions

    View Slide

  187. Data Versions
    Consumer
    v1
    v1
    v1
    v1
    v2
    This team needs to
    know to make these
    changes

    View Slide

  188. Versioning
    is broken

    View Slide

  189. (sem)Versioning
    is broken

    View Slide

  190. Change
    Growth Breakage

    View Slide

  191. Change
    Growth Breakage
    Never do this

    View Slide

  192. Growing
    schemas should
    be the default

    View Slide

  193. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads

    View Slide

  194. {
    msg_id: String,
    type: String,
    data: {
    user_id: Integer,
    msg: String
    }
    }
    Data payloads
    What are these?

    View Slide

  195. Dependent
    Types

    View Slide

  196. {
    msg_id: String,
    type: String,
    data: {
    user_id: Integer,
    msg: String
    }
    }
    Data payloads
    What are these?

    View Slide

  197. Norm

    View Slide

  198. {
    msg_id: String,
    type: String,
    data: {
    user_id: String,
    msg: String
    }
    }
    Data payloads

    View Slide

  199. UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F]
    {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i)
    )
    CommentCreated = schema{
    req :msg_id, UUID
    req :type, lit(“comment.created”)
    req :data, schema {
    req :user_id, integer? | UUID
    req :msg, string?
    }
    }
    Data payloads

    View Slide

  200. json = {type: “comment.created”, msg: “Hello world”}
    Norm.decode(CommentEvent, json)
    => {:ok, data}
    Norm.decode(CommentEvent, {})
    => {:error, errors}

    Norm.explain(CommentEvent, {})
    => "In :msg_id, val: {} fails spec: required
    In :type, val: {} fails spec: required
    In :data, val: {} fails spec: required"
    Data payloads

    View Slide

  201. Norm is built
    for
    extensibility

    View Slide

  202. CommentEvent = schema{
    req :type, lit(“comment.created”)
    req :msg, string?
    }
    json = {
    type: “comment.created”,
    msg: “Hello world”,
    data: {
    msg: “Hello world”
    }
    }
    Norm.decode(CommentEvent, json)
    => {:ok, data}
    Norm is extensible

    View Slide

  203. CommentEvent = schema{
    req :type, lit(“comment.created”)
    req :msg, string?
    }
    json = {
    type: “comment.created”,
    msg: “Hello world”,
    data: {
    msg: “Hello world”
    }
    }
    Norm.decode(CommentEvent, json)
    => {:ok, data}
    Norm is extensible
    This will still get
    passed through

    View Slide

  204. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  205. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  206. Property Based
    Testing

    View Slide

  207. Property based testing
    Database
    Consumer

    View Slide

  208. Property based testing
    Database
    Consumer
    id: 1
    id: 2
    id: 3
    id: 1

    View Slide

  209. Property based testing
    Database
    Consumer
    id: 1
    id: 2
    id: 3
    id: 1
    Information should end up here

    View Slide

  210. Property based testing
    Database
    Consumer
    id: 1
    id: 2
    id: 3
    id: 1
    Some combination
    of these messages
    causes a failure

    View Slide

  211. Property based testing
    Database
    id: 1
    id: 1
    Consumer

    View Slide

  212. Property based testing
    Database
    id: 1
    id: 1
    Looks like we aren’t
    handling duplicates
    correctly
    Consumer

    View Slide

  213. Property based testing
    Database
    id: 1
    id: 1
    Consumer

    View Slide

  214. Property based testing
    Database
    Consumer
    id: 1
    id: 1
    Deterministically
    fail this connection

    View Slide

  215. Chaos
    Engineering

    View Slide

  216. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  217. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Finding Errors
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  218. Monitoring
    vs.
    Observability

    View Slide

  219. Monitoring:
    Figuring out that there’s a
    problem

    View Slide

  220. Observability:
    Determining what the
    problem is.

    View Slide

  221. Goal:
    Detect lagging or
    blocked consumers

    View Slide

  222. Wisen

    View Slide

  223. Wisen
    User Events
    User Consumer

    View Slide

  224. metadata topic
    Wisen
    User Events
    Checkpoints its
    position in the log
    to an offset topic
    User Consumer

    View Slide

  225. Wisen
    metadata topic
    Wisen User Consumer
    User Events

    View Slide

  226. Wisen
    metadata topic
    Wisen User Consumer
    User Events
    Compares farthest
    offset from
    checkpoints over a
    time-window

    View Slide

  227. Wisen
    user_consumer_errors
    Wisen User Consumer
    User Events

    View Slide

  228. Wisen
    user_consumer_errors
    Wisen User Consumer
    User Events

    View Slide

  229. Wisen
    user_consumer_errors
    Wisen User Consumer
    User Events
    Alert if we see a
    rise in errors

    View Slide

  230. Other useful metrics:
    Median and Tail latencies
    Internal buffers
    DB/Cache/RPC latencies

    View Slide

  231. OpenTracing

    View Slide

  232. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  233. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  234. This has to be
    done up-front

    View Slide

  235. Calculating partions
    messages in the system = arrival rate * mean time in system

    View Slide

  236. Calculating partions
    Desired throughput / measured throughput on one partition
    => partitions needed

    View Slide

  237. Calculating partions
    partitions < 100 x brokers x replication factor
    source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster

    View Slide

  238. Increasing
    partitions
    is tricky if you rely
    on ordering

    View Slide

  239. to_int(hash(user_id)) % partitions

    View Slide

  240. to_int(hash(user_id)) % partitions
    Existing data is not reshuffled
    if partitions are increased

    View Slide

  241. Data is not
    forever.

    View Slide

  242. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  243. Lets talk about…
    Kafka Terminology
    Maintaining Order
    Errors
    Distributed Systems and the joys of functional programming
    Data Validation
    Monitoring
    Capacity Planning
    #hottakes

    View Slide

  244. CQRS
    &
    Event Sourcing

    View Slide

  245. Don’t rush to
    democratize
    your data

    View Slide

  246. Embrace data
    and design

    View Slide

  247. Go forth and
    build awesome
    stuff!

    View Slide

  248. Thanks
    Chris Keathley / @ChrisKeathley / keathley.io

    View Slide