Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling GitHub

Scaling GitHub

A month after launching, GitHub hosted one thousand repositories. Three years later, we host over three million. In the same time we've gone from one thousand users to over a million.

This type of scaling presents some interesting technical challenges. I'll dig into our development workflow and how we address concepts like scaling, deployment, code review, and testing.

It also presents some interesting business challenges, too. How you grow your company from three employees, how you work in teams, and how you split your app up into services all help ensure that you'll be able to react to your product's growth.

http://zachholman.com/talk/scaling-github

Zach Holman

January 26, 2012
Tweet

More Decks by Zach Holman

Other Decks in Programming

Transcript

  1. Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    ling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    Scaling github
    SCALING
    GITHUB
    scalin’ githubs
    Scaling GitHub
    Scaling GitHub
    Scaling GitHub
    githubs and
    shit
    Scaling GitHub
    Scaling GitHub
    Scaling Startups
    B=======D~~~~
    Scaling GitHub

    View Slide

  2. “Scale”

    View Slide

  3. Two problems.

    View Slide

  4. SyntaxError: compile error.
    I’m too hungover to work.
    ORGANIZATIONAL
    TECHNICAL

    View Slide

  5. Scaling is people + technology

    View Slide

  6. @holman

    View Slide

  7. View Slide

  8. Organizational
    jeez humans are so finicky

    View Slide

  9. Happiness.

    View Slide

  10. 0
    250,000
    500,000
    750,000
    1,000,000
    Happiness vs Productivity


    $
    $$$

    View Slide

  11. happy employees are
    productive employees

    View Slide

  12. productive employees
    are happy employees

    View Slide

  13. This isn’t a “management problem”.
    Everyone needs to worry about this.

    View Slide

  14. Hiring an employee is the
    most thing
    you can do to your startup.
    T O X I C

    View Slide

  15. Hiring an employee is the
    most thing
    you can do to your startup.
    T O X I C
    work slower
    more bugs less features
    worse culture

    View Slide

  16. Hiring an employee is the
    most thing
    you can do to your startup.
    EXCITING
    work faster
    fewer bugs more features
    better culture

    View Slide

  17. so how can you
    score excitement
    and avoid the toxic?

    View Slide

  18. TOXIC EXCITEMENT
    would be a great name for a rock band
    yeah, i know...

    View Slide

  19. k
    S
    e
    k 2
    k
    S
    S
    k
    k
    k
    k
    S
    S
    k
    S
    UKeep your employees happy.
    Really happy.

    View Slide

  20. Your servers, offices, and ideas are bullshit.
    Worry about your coworkers.

    View Slide

  21. EMPLOYEES NEW HIRES
    Know your codebase
    Know your process
    Know your mistakes
    Know your mission
    Don’t know jack
    Know your jokes
    Know your priorities

    View Slide

  22. Imprison your employees with happiness and
    nice things and cuddly work practices.

    View Slide

  23. GitHub Jail
    work whenever you want
    work however you want
    work on what you want
    health, dental, vision
    paid conference trips
    retirement plans
    solid salaries
    a product people love
    four beers on tap stock

    View Slide

  24. get out of the way
    NO MEETINGS
    NO PLANNING SESSIONS
    NO NEED TO BE IN THE OFFICE
    chat, pull requests, email
    MORE DIRECT
    FASTER
    ALWAYS RECORDED

    View Slide

  25. This is designed to retain people.
    We’re at 56 employees. We haven’t lost one.
    This is a huge, massive competitive advantage.
    It justifies the extra expense.

    View Slide

  26. Communication.

    View Slide

  27. Don’t have the server guy who knows everything.
    the billing girl
    the testing dude
    the customer support maven
    the performance czar
    the software licensing file hoarder

    View Slide

  28. Don’t have the person who knows everything.

    View Slide

  29. Specialization is great,
    but only having one person
    is a synchronous bottleneck.

    View Slide

  30. Reduce institutional knowledge.

    View Slide

  31. Reduce institutional knowledge.
    wikis
    issues
    chat logs
    pull requests
    {

    View Slide

  32. V Every internal GitHub talk
    is automatically recorded,
    uploaded, and viewable to
    every future employee.

    View Slide

  33. V ...on a Kinect-powered
    Arduino-based motion-
    detecting portable video
    recording platform.

    View Slide

  34. Your new hire is stoked to dive in,
    start reading, and start contributing
    ...so don’t get in their way.

    View Slide

  35. Hire well.

    View Slide

  36. Hiring poorly is just as bad
    as losing people.

    View Slide

  37. Aim for really great people.

    View Slide

  38. WE SELF-STARTERS
    k
    less babysitting, more code

    View Slide

  39. k
    S
    e
    k 2
    k
    S
    S
    k
    k
    k
    k
    S
    S
    k
    S
    UKeep your employees happy.
    Really happy.
    (future!)

    View Slide

  40. Don’t just market your product;
    market your team and company too.

    View Slide

  41. Always think
    about attracting
    good people,
    even if you’re
    not hiring.
    OPEN SOURCE
    CONFERENCES
    TECHNICAL POSTS
    SPONSORSHIPS
    MEETUPS
    TALKS

    View Slide

  42. Technical
    robots can be pretty finicky too

    View Slide

  43. Automate.

    View Slide

  44. hubot deploy github to production
    COMPILATION
    CoffeeScript
    SCSS and SASS
    bundles assets
    caches Python dependencies
    compiles Erlang changes
    compiles C changes
    builds static pages
    APP SETUP
    installs gems
    symlink directories
    14 rolling app server restarts
    NOTIFY
    Campfire
    New Relic
    graphite
    fs fs fs fs fs fs fs fs fs fs
    fs fs fs fs fs fs fs fs fs fs
    fe fe fe fe fe fe fe fe fe fe
    fe fe fe fe fe fe fe fe fe fe
    fe fe fe fe fe fe fe fe fe fe
    fs fs fs fs fs fs fs fs fs fs

    View Slide

  45. deploys
    current process overview
    multi-server shell commands
    new employee setup
    app bootstrap

    View Slide

  46. Automating now will save you way
    more time down the road.

    View Slide

  47. Ship.

    View Slide

  48. Ship early, ship often.
    5x-30x
    deploys per day

    View Slide

  49. master = always deployable
    always green tests
    always a safe rollback

    View Slide

  50. Limit your deployments
    to staff-only
    to beta users only
    to one server only
    to one app process on one server only

    View Slide

  51. @github tweets
    exceptions
    deploys
    deploys

    View Slide

  52. Graph.

    View Slide

  53. everyone loves fancy graphs
    quickly see trends
    quickly see problems
    historical data as basis for alerts

    View Slide

  54. METRICS ARE GREAT
    But use them wisely.

    View Slide

  55. 162ms
    average overall response time

    View Slide

  56. Valueless metric.

    View Slide

  57. 59ms
    average API response time
    with 4x throughput of web

    View Slide

  58. 23ms
    average raw response time
    with 2x throughput of web

    View Slide

  59. The responsiveness is a lie.

    View Slide

  60. 199ms
    average browser response time

    View Slide

  61. 16,000
    requests in the last week over 4.5s

    View Slide

  62. Needed to look at the
    right stuff.

    View Slide

  63. View Slide

  64. throttled google
    googlebot
    2-3x throughput
    3-4x CPU usage
    had
    web requests
    compared to

    View Slide

  65. Collect a lot of metrics,
    but make sure they’re
    important metrics.

    View Slide

  66. GitHub scale.

    View Slide

  67. Everyone has different
    growth patterns.

    View Slide

  68. GitHub has had three.

    View Slide

  69. Launch
    2008
    Bare metal servers
    2009
    net-shard
    2010
    major github
    infrastructure milestones

    View Slide

  70. Launch
    2008
    Hosted on Engine Yard
    10 VMs
    54GB RAM
    shared GFS mount
    one metric shit-ton of caching

    View Slide

  71. Bare metal servers
    2009
    Hosted on Rackspace
    16 bare metal servers
    288GB of RAM
    redundant disk storage

    View Slide

  72. net-shard
    2010
    networks share a common repository
    rails/rails
    holman/rails github/rails
    +1 commit +30 commits
    classic net-shard
    rails network repo
    ...multiplied 2,600 times
    holman/rails rails/rails github/rails
    fat network, skeleton forks

    View Slide

  73. net-shard
    2010
    networks share a common repository
    they also share the same fs and partition
    halves storage requirements
    improves hit rate of kernel disk cache
    speeds up backups
    allows fast forks, merge button, network GC

    View Slide

  74. For GitHub, scaling involved a lot of
    predictions of future trends, then
    acting appropriately.

    View Slide

  75. Side Projects.

    View Slide

  76. A THOUGHT EXPERIMENT:
    Imagine I told you to build...

    View Slide

  77. View Slide

  78. This grew organically, over dozens of
    projects, written by dozens of employees,
    when they felt like it.

    View Slide

  79. Figure out how to let this happen. It’s hard.

    View Slide

  80. Small hack days can result in
    real, imma-make-us-money impact.

    View Slide

  81. Small hack days can also keep your
    developers insanely happy.

    View Slide

  82. Small hack days can also lead to
    learning new techniques.

    View Slide

  83. Projects and Posts.

    View Slide

  84. JENKINS + CAMPFIRE
    github.com/github/janky
    CHAT ROOM ROBOT
    github.com/github/hubot
    OFFICE MUSIC DJ
    github.com/holman/play

    View Slide

  85. BLOG: GITHUB IS MOVING TO RACKSPACE
    git.io/jByrlQ
    BLOG: HOW WE MADE GITHUB FAST
    git.io/p5v2Ag
    BLOG: UNICORN
    git.io/77Onfg

    View Slide

  86. +
    Technical
    Organizational

    View Slide

  87. Continually refine your
    process + workflow.

    View Slide

  88. Worry about your
    computers, and worry
    about your humans.

    View Slide

  89. Thanks.

    View Slide

  90. ZACH HOLMAN
    zachholman.com/talks
    @holman
    twitter+github:

    View Slide