Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualization

Eitan Lees
November 15, 2019

 Visualization

A talk about the theory of visualization

Eitan Lees

November 15, 2019
Tweet

More Decks by Eitan Lees

Other Decks in Education

Transcript

  1. By Eitan Lees
    Visualization

    View Slide

  2. View Slide

  3. “The ubiquity of visual metaphors in
    describing cognitive processes hints
    at a nexus of relationships between
    what we see and what we think”
    - Mackinlay & Card (1999)

    View Slide

  4. External
    Cognition

    View Slide

  5. View Slide

  6. View Slide

  7. Part 1:
    Data Wrangling
    Part 2:
    Visual Encodings
    Part 3:
    Graphical Critique
    Part 4:
    Practical Advice

    View Slide

  8. Part 1:
    Data Wrangling

    View Slide

  9. Clean data sets
    are all alike;
    every unclean
    data set is
    unclean in its
    own way

    View Slide

  10. Things to Consider:
    - Make Numbers ⇒ Numbers
    - Make Dates ⇒ Dates
    - Make Nans ⇒ Nans
    - Make sure strings aren’t corrupted

    View Slide

  11. Before we visualize,
    let’s tidy up

    View Slide

  12. country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583

    View Slide

  13. country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    Variables

    View Slide

  14. country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    Variables
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    Observations

    View Slide

  15. country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583

    View Slide

  16. country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    Variables
    country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583

    View Slide

  17. country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    Variables
    country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    Observations

    View Slide

  18. country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583

    View Slide

  19. country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    We want to
    gather the values
    corresponding to
    each key.

    View Slide

  20. country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    country year key value
    Afghanistan 1999 cases 745
    Afghanistan 1999 population 19987071
    Afghanistan 2000 cases 2666
    Afghanistan 2000 population 20595360
    Brazil 1999 cases 37737
    Brazil 1999 population 172006362
    Brazil 2000 cases 80488
    Brazil 2000 population 174504898
    China 1999 cases 212258
    China 1999 population 1272915272
    China 2000 cases 213766
    China 2000 population 1280428583
    Tidy
    We want to
    gather the values
    corresponding to
    each key.

    View Slide

  21. country 1999 2000
    Afghanistan 745 2666
    Brazil 37737 80488
    China 212258 213766

    View Slide

  22. country 1999 2000
    Afghanistan 745 2666
    Brazil 37737 80488
    China 212258 213766
    We want to
    spread the
    values to the
    corresponding
    keys.

    View Slide

  23. country year cases
    Afghanistan 1999 745
    Afghanistan 2000 2666
    Brazil 1999 37737
    Brazil 2000 80488
    China 1999 212258
    China 2000 213766
    country 1999 2000
    Afghanistan 745 2666
    Brazil 37737 80488
    China 212258 213766
    We want to
    spread the
    values to the
    corresponding
    keys.

    View Slide

  24. country year cases
    Afghanistan 1999 745
    Afghanistan 2000 2666
    Brazil 1999 37737
    Brazil 2000 80488
    China 1999 212258
    China 2000 213766
    country 1999 2000
    Afghanistan 745 2666
    Brazil 37737 80488
    China 212258 213766
    Tidy
    We want to
    spread the
    values to the
    corresponding
    keys.

    View Slide

  25. Tidy Data
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    Variables
    country year cases population
    Afghanistan 1999 745 19987071
    Afghanistan 2000 2666 20595360
    Brazil 1999 37737 172006362
    Brazil 2000 80488 174504898
    China 1999 212258 1272915272
    China 2000 213766 1280428583
    Observations
    Values

    View Slide

  26. Part 2:
    Visual Encoding

    View Slide

  27. Compare area of circles

    View Slide

  28. Compare length of bars

    View Slide

  29. Compare length of bars

    View Slide

  30. Compare area of circles

    View Slide

  31. Length
    Area Slope
    Position
    Angle
    Volume
    Color Value Color Hue
    Shape
    Visual Encoding Channels
    And many more ...

    View Slide

  32. Length Area
    Slope
    Position
    Angle Volume
    Color Value
    Color Hue
    Shape
    Accuracy ranking of quantitative perceptual tasks.
    Better Worse

    View Slide

  33. Data Models

    View Slide

  34. Nominal:
    - Labels and Categories
    - Example: Pill Shape
    - Operations: =, ≠

    View Slide

  35. Nominal:
    - Labels and Categories
    - Example: Pill Shape
    - Operations: =, ≠
    Ordinal:
    - Ordered Sets
    - Example: Drug Schedule
    - Operations: =, ≠, <, >

    View Slide

  36. Nominal:
    - Labels and Categories
    - Example: Pill Shape
    - Operations: =, ≠
    Ordinal:
    - Ordered Sets
    - Example: Drug Schedule
    - Operations: =, ≠, <, >
    Quantitative:
    - Numerical Measurement
    - Example: Dosage
    - Operations: =, ≠, <, >, -, %

    View Slide

  37. View Slide

  38. View Slide

  39. 2D Plane
    Size
    Color Value
    Texture
    Color Hue
    Angle
    Shape

    View Slide

  40. 2D Plane
    Size
    Color Value
    Texture
    Color Hue
    Angle
    Shape
    Suitable for
    Ordered Data
    Suitable for
    Unordered Data

    View Slide

  41. 2D Plane
    Size
    Color Value
    Texture
    Color Hue
    Angle
    Shape
    Suitable for
    Ordered Data
    Position
    Area
    Color Value

    View Slide

  42. 2D Plane
    Size
    Color Value
    Texture
    Color Hue
    Angle
    Shape
    Suitable for
    Unordered Data
    Angle
    Color Hue
    Shape

    View Slide

  43. Position N O Q
    Size N O Q
    Color Value N O Q
    Texture N O
    Color Hue N
    Angle N
    Shape N
    Nominal
    Ordinal
    Quantitative
    Note:
    Q⊂O⊂N
    Bertin’s Levels of Organization

    View Slide

  44. View Slide

  45. Grammar of Graphics
    1. Data
    2. Transformations
    3. Marks
    4. Encoding - mapping from fields to mark
    properties
    5. Scale - functions that map data to visual
    scales
    6. Guides - visualizations of scales (axes,
    legends, etc.)

    View Slide

  46. Building Blocks of Visualization

    View Slide

  47. Part 3:
    Graphical Critique

    View Slide

  48. Most of modern statistical graphics can be
    traced back to William Playfair a Scottish
    engineer and political economist.
    William Playfair

    View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. Measle cases per 100000 people

    View Slide

  53. Connectivity diagram of
    character in Les Misérables

    View Slide

  54. Data Visualization is
    Everywhere!

    View Slide

  55. Edward Tufte

    View Slide

  56. Edward Tufte
    “Graphical excellence is that which gives to the
    viewer the greatest number of ideas in the shortest
    time with the least ink in the smallest space.”
    ― Edward R. Tufte, The Visual Display of
    Quantitative Information

    View Slide

  57. Edward Tufte
    “Graphical excellence is that which gives to the
    viewer the greatest number of ideas in the shortest
    time with the least ink in the smallest space.”
    ― Edward R. Tufte, The Visual Display of
    Quantitative Information

    View Slide

  58. Edward Tufte
    “Graphical excellence is that which gives to the
    viewer the greatest number of ideas in the shortest
    time with the least ink in the smallest space.”
    ― Edward R. Tufte, The Visual Display of
    Quantitative Information
    (within reason!)

    View Slide

  59. Data to Ink Ratio

    View Slide

  60. Data to Ink Ratio

    View Slide

  61. Data to Ink Ratio
    Reasonable?

    View Slide

  62. 75%
    50%
    25%
    min
    max
    Tukey
    Data to Ink Ratio

    View Slide

  63. Tukey
    Data to Ink Ratio

    View Slide

  64. Tukey Tufte #1
    Data to Ink Ratio

    View Slide

  65. Tukey Tufte #1 Tufte #2
    Data to Ink Ratio

    View Slide

  66. Tukey Tufte #1 Tufte #2
    Data to Ink Ratio
    Unreasonable?

    View Slide

  67. Sparklines
    “A sparkline is a small intense,
    simple, word-sized graphic with
    typographic resolution … ”
    - Edward Tufte,
    Beautiful Evidence, p. 46-63.

    View Slide

  68. Small Multiples
    “At the heart of quantitative reasoning
    is a single question: Compared to
    what?
    Small multiple designs answer directly
    by visually enforcing comparisons of
    changes, of the differences among
    objects, of the scope of alternatives.”
    - Edward Tufte,
    Envisioning Information, p. 67

    View Slide

  69. View Slide

  70. View Slide

  71. Yeah, well, that's
    just, like, your
    opinion, man.

    View Slide

  72. Part 4:
    Practical Advice

    View Slide

  73. Ten Simple Rules for Better Figures
    By Nicolas P. Rougier
    1. Know your audience
    2. Identify your message
    3. Adapt the figure to the support medium
    4. Captions are not optional
    5. Do not trust the defaults
    6. Use color effectively
    7. Do not mislead the reader
    8. Avoid “Chartjunk”
    9. Message trumps beauty
    10. Get the right tool

    View Slide

  74. 1. Know your audience
    2. Identify your message
    3. Adapt the figure to the support medium
    4. Captions are not optional
    5. Do not trust the defaults
    6. Use color effectively
    7. Do not mislead the reader
    8. Avoid “Chartjunk”
    9. Message trumps beauty
    10. Get the right tool
    Ten Simple Rules for Better Figures
    By Nicolas P. Rougier

    View Slide

  75. 1. Know your audience
    2. Identify your message
    3. Adapt the figure to the support medium
    4. Captions are not optional
    5. Do not trust the defaults
    6. Use color effectively
    7. Do not mislead the reader
    8. Avoid “Chartjunk”
    9. Message trumps beauty
    10. Get the right tool
    Ten Simple Rules for Better Figures
    By Nicolas P. Rougier

    View Slide

  76. 1. Know your audience
    2. Identify your message
    3. Adapt the figure to the support medium
    4. Captions are not optional
    5. Do not trust the defaults
    6. Use color effectively
    7. Do not mislead the reader
    8. Avoid “Chartjunk”
    9. Message trumps beauty
    10. Get the right tool
    Ten Simple Rules for Better Figures
    By Nicolas P. Rougier

    View Slide

  77. 1. Know your audience
    2. Identify your message
    3. Adapt the figure to the support medium
    4. Captions are not optional
    5. Do not trust the defaults
    6. Use color effectively
    7. Do not mislead the reader
    8. Avoid “Chartjunk”
    9. Message trumps beauty
    10. Get the right tool
    Ten Simple Rules for Better Figures
    By Nicolas P. Rougier

    View Slide

  78. View Slide