Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intelligence is not enough: The humanity of engineering

Bryan Cantrill
October 06, 2023
750

Intelligence is not enough: The humanity of engineering

Presentation that I gave at Monktoberfest 2023. Video coming!

Bryan Cantrill

October 06, 2023
Tweet

Transcript

  1. Intelligence is not enough
    The humanity of engineering
    Bryan Cantrill
    Oxide Computer Company

    View Slide

  2. OXIDE
    It always starts with a tweet…

    View Slide

  3. OXIDE
    It always starts with a tweet being trolled…

    View Slide

  4. OXIDE
    It always starts with a tweet being trolled…

    View Slide

  5. OXIDE
    It always starts with a tweet being trolled…

    View Slide

  6. OXIDE
    It always starts with a tweet being trolled…

    View Slide

  7. OXIDE
    “Serious”?
    • This tweet used the word “serious” three times, mainly to deride others
    • Not clear what “serious” means in the context of an argument that
    equates a computer program with nuclear weapons?
    • Or accuses anyone who disagrees with this assessment of “just vibes”?
    • Or one that puts the risk of human extinction at the (metaphorical!)
    hands of a computer program to be 5% with zero methodology?
    • So, a serious question: why treat this seriously at all?

    View Slide

  8. OXIDE
    Reasons to treat this seriously
    • Fear of technology isn’t new – and isn’t always poorly founded!
    • New technologies often have unintended consequences and
    externalities that merit consideration and discussion
    • But in those who believe in AI-based extinction risk, the fear itself is
    alarming – in part because of the actions that it would justify
    • The “AI pause” – if implemented – would be brazenly authoritarian
    • The accompanying rhetoric is often disturbingly violent

    View Slide

  9. OXIDE
    Concrete extinction risk
    • Most AGI-based extinction risk fears – when made concrete – hinge on:
    ○ A computer program getting ahold of nuclear weapons
    ○ A computer program making a novel bioweapon
    ○ A computer program developing novel molecular nanotechnology
    • We are going to leave aside nuclear weapons, as indisputably serious
    people have been thinking about it since the dawn of the atomic age
    • But the latter two have something important in common…

    View Slide

  10. OXIDE
    Superintelligent engineering?
    • Whether stated explicitly or not, when we talk about the fear of a
    superintelligent AI actively killing not just some humans but all of them,
    we are talking about AI making weapons
    • Let us leave aside many questions about such scenarios (e.g., AI’s
    alignment, motivation, or means of production – and human adaptability,
    countermeasures, and resilience), and focus on one pillar…
    • It depends on AI making applying the constraints of physical and
    mathematical reality to make new stuff – which is to say, engineering

    View Slide

  11. OXIDE
    Engineering and intelligence
    • If our very existence is threatened by a superintelligence engaged in
    engineering, it prompts an important question…
    • Is engineering an act of intelligence alone?
    • I can’t speak to building novel bioweapons or the significant challenges
    in reviving otherwise moribund molecular nanotechnology…
    • …but we do have a bunch of recent experience building something big
    and new that is surely simpler than these domains

    View Slide

  12. OXIDE
    What we built!

    View Slide

  13. OXIDE
    Building a computer
    • In case it needs to be said: building a new computer + new network
    switch + high-speed backplane + all software from lowest levels of
    firmware to highest levels of control plane is hard and complicated
    • It is still, however, engineering not science
    • Engineering is the act of learning from failure: even when building anew,
    there will be many occasions when the system does not, in fact, work!
    • It is worth exploring a tiny fraction of the failures that we endured in
    building, as they are instructive as to the nature of engineering…

    View Slide

  14. OXIDE
    Failure to bring CPU out of reset
    • Despite following the documented power sequencing to the CPU (AMD
    Milan), it was refusing to come out of reset, simply reinitiating the
    power-on sequence after 1.25 seconds of inactivity
    • Natural assumption was that power was marginal – but the power
    looked good (and making it extraordinary didn’t change anything)
    • Went down any number of blind alleys, performing directed experiments
    with respect to non-connected pins that shouldn’t make any difference
    • These experiments weren’t easy!

    View Slide

  15. OXIDE
    Failure to bring CPU out of reset

    View Slide

  16. OXIDE
    Failure to bring CPU out of reset
    • After several weeks of debugging, we discovered that our voltage
    regulator had a firmware bug: it adjusted voltage as requested by the
    CPU via SVI2 – but never sent a completion (VOTF Complete)
    • The CPU had no way of knowing that the power was in fact correct
    • AMD’s tool for verifying power (SDLE) did not check for this packet
    • Corrected regulator firmware resulted in the CPU coming out of reset!

    View Slide

  17. OXIDE
    Failure to bring NIC out of reset
    • We could not get the Chelsio NIC to come out of reset
    • Extensive validation did not reveal any signal that was out of spec
    • Attempting to take a working add-in card (AIC) and destroy it revealed
    that one of the pinstrap resistors (to select the clock source) was
    incorrectly specified
    • We had a 1K ohm pull-down resistor, but this was in fact too weak –
    and a 499 ohm resistor was required to overcome an internal pull-up
    • Reworking with the correct resistor resulted in the NIC correctly starting!

    View Slide

  18. OXIDE
    NIC transiently failing to train all PCIe lanes
    • We have our own platform enablement layer (i.e., no BIOS); we are
    responsible for initializing devices at the lowest layer
    • With disconcerting frequency, some number of Chelsio NIC links did not
    train correctly for some of their lanes on boot
    • Decoding the Link Status and Training State Machine (LSTSM) on the
    CPU allowed us to better understand where it was failing, but not why
    • Discovered that a second PERST resulted in correct training – and
    moreover that this second PERST is present on legacy firmware!

    View Slide

  19. OXIDE
    Failure to connect to U.2 NVMe drives
    • In a revision of our PCIe-to-U.2 passthrough card (Sharkfin), we had I2C
    connectivity – but no PCIe connectivity whatsoever
    • A previous version of this card had worked, but little had changed in the
    schematic and the layout – why were the new ones broken?!
    • Physical inspection revealed that one of the parts was simply wrong!
    • The wrong reel of parts had been loaded into a pick-and-place machine,
    and an inverter had been laid down instead of an AND gate (!)
    • Reworked ~1200 cards in ~96 hours!

    View Slide

  20. OXIDE
    Random data corruption on software install
    • When installing OS boot images, sporadic (!) corruption was seen
    • Adding checksums to these images revealed corruption was rampant (!!)
    • Microprocessor was speculatively loading through a stowaway mapping
    from early boot, which was allocating in the TLB
    • If application address conflicted with address of stowaway mapping,
    kernel would incorrectly copy data from the wire to the wrong location
    • Eliminating stowaway mapping eliminated the corruption – but
    highlighted divergent perspectives on side-effects of speculative loads

    View Slide

  21. OXIDE
    What do these have in common?
    • Each posed an existential risk for the artifact: without solving them, we
    wouldn’t have something that’s impaired – we would have nothing
    • Each revealed an emergent property, often at an interface boundary
    • The breakthrough was often something that “shouldn’t” have worked
    • Intelligence alone does not solve problems like this
    • In all cases, we summoned other elements of our character: our
    resilience, our teamwork, our rigor, our optimism, our curiosity

    View Slide

  22. OXIDE
    Values in engineering
    • These extra-intelligence values are so important to us, that we have
    codified them – and use them very explicitly as a lens for hiring
    • To be clear, we are certainly seeking capable, intelligent people – but
    that intelligence is useless without these shared (human!) values
    • We may be more explicit about it than others, but many engineering
    teams are also implicitly hiring for shared values
    • Viz.: It is comical to think of an engineering team hiring based only on
    the results of a test – or any other linear measure of intelligence!

    View Slide

  23. OXIDE
    The humanity in engineering
    • This humanity necessary to understand and resolve failure – so essential
    in designing and building – is hidden in the final artifact
    • This is the soul in Tracy Kidder’s Soul of a New Machine – and the
    perspiration in Edison’s proverbial 99% perspiration
    • Computer programs lack this humanity: they do not have willpower,
    desire, or drive – let alone the deeper human qualities required
    • Which doesn’t mean that AI can’t be useful to engineers, merely that it
    cannot engineer autonomously

    View Slide

  24. OXIDE
    So, should we worry about AI?
    • Extinction risk due to AGI is de minimis – but we must not falsely
    dichotomize AI into posing existential risk or no risk whatsoever!
    • The risk that AI does pose may feel mundane – but it is much more
    how it will be abused (deliberately or accidentally) by existing structures
    • AI ethics is exceedingly important, especially when it is being used to
    inform decisions that affect people’s lives!
    • By acknowledging that AI is and will be an important tool, we can move
    beyond fear to focus on enforcing existing regulatory regimes

    View Slide

  25. OXIDE
    Further wells to fall down information
    • Richard Smalley/K. Eric Drexler debate on molecular nanotechnology
    • Lex Friedman interview with Marc Andreessen
    • Logan Bartlett interview with Eliezer Yudkowsky
    • Oxide and Friends podcast, especially Okay Doomer, Tales From the
    Bringup Lab and More Tales from the Bringup Lab

    View Slide