Boeing 737 MAX and Software's Uncanny Valley

Airplane in the sky

On March 10th, the 2nd Boeing 737 MAX crashed in just under 6 months. The catastrophe has caused fear, speculation of the cause, and the grounding of the entire 737 MAX fleet of planes in all countries that fly them.

This is a tragic event that has caused many deaths, and we still don’t know the exact cause. If more details come out, it might completely change the narrative but for now there’s one particular angle that I’m interested in: the computer/software + human component of this story.

Boeing’s New Aircraft

This new aircraft model, the Boeing 737 MAX 8, is the successor to the previous 737 which first flew in 1967. One of the changes is the addition of the system that is now being focused on as having caused the first crash and possibly the latest. The system is called MCAS, or if you enjoy enterprise programming: Maneuver Characteristics Augmentation System.

This new system was added due to a difference in how the plane was designed which changed how it handled in air. The system, when enabled, would hide this new flight behavior behind a layer of software abstraction that emulated the way the previous 737 flew. This was effectively using upfront software costs instead of having to retrain pilots to handle this new behavior.

A few weeks after the first crash in November of last year, the Wall Street Journal reported:

Boeing marketed the MAX 8 partly by telling customers it wouldn’t need pilots to undergo additional simulator training beyond that already required for older versions, according to industry and government officials. One high-ranking Boeing official said the company had decided against disclosing more details to cockpit crews due to concerns about inundating average pilots with too much information—and significantly more technical data—than they needed or could digest.

It isn’t hard to see the problem with this: in the event the system malfunctions, the pilot still needs to know that the MCAS is malfunctioning and how to turn it off.

This is very common working at a tech company. You have automated scripts or systems that help gather diagnostic information, logs, or any other information during an outage to debug what is happening. This works great — until your tools don’t work.

When this happens, you need to diagnose the problem manually and if you don’t know where or how to find that information, you are left in pretty bad shape and with a system that is broken.

This appears to be what happened with the first crash back in October 2018. It was determined after the event that the aircraft crashed due to a malfunctioning sensor that sent incorrect data to the MCAS system. The system was trying to respond to this incorrect data but the pilots didn’t know what the aircraft was doing, so they tried to counter its response and never turned off the system which resulted in the crash.

An excellent article in Air Facts by Mac McClellan called Can Boeing Trust Pilots? looks at the latest news with way more detail and knowledge that McClellan, an experienced pilot, can provide. It is a very intriguing read.

These few things stood out to me from McClellan’s article:

What Boeing is doing is using the age-old concept of using the human pilots as a critical element of the system. Before fly-by-wire (FBW) came along, nearly all critical systems in all sizes of airplanes counted on the pilot to be a crucial part of the system operation.

Just for some background without getting too much into the weeds: the term fly-by-wire (FBW) is similar to analog vs digital.

Before fly-by-wire, everything was mechanical but with it interfaces are converted to flight controls digitally. This means a computer can manipulate controls and fly the plane without a human intervening to move levers, wheels, peddles, etc.

And then a bit later:

FBW removes the pilot as a critical part of the system and relies on multiple computers to handle failures.

Boeing is now faced with the difficult task of explaining to the media why pilots must know how to intervene after a system failure. And also to explain that airplanes have been built and certified this way for many decades. Pilots have been the last line of defense when things go wrong.

What makes that such a tall order is that FBW airplanes – which include all the recent Airbus fleet, and the 777 and 787 from Boeing – don’t rely on the pilots to handle flight control system failures. FBW uses at least a triple redundant computer control system to interpret the inputs of the cockpit controls by pilots into movement of the airplane flight controls, including the trim. If part of the FBW system fails, the computer identifies the faulty elements and flies on without the human pilots needing to know how to disable the failed system.

So what this comes down to is that Boeing already can handle these failures on other more advanced models of aircrafts. Since the past version of this system required pilot intervention in the original 737, the current 737 MAX also required the pilot to intervene.

This left us in this state:

  • The aircraft was improved with this new system, the MCAS, to retain flight characteristics of the original 737
  • This new system, which meant to be a minor version bump, actually required additional training of the pilots. They introduced a backwards incompatible change and broke the pilot’s API

They made these choices to make the transition to the new aircraft easier for the pilots, who are, by the way, professionals trained extensively and have years of expertise.

This is where the issue of human and computer interaction starts to get fascinating.

Human & Computer Assistance

There was an interesting point that Sam Altman made on the Conversations with Tyler podcast:

You know, when AI started beating humans in chess, there was a short period of time where the very best thing of all was a human and an AI playing chess together. The AI would say, “Here’s six moves.” The human would pick the best one of those, and that was better. That could beat an unassisted… that merged version—it’s not really a merge—that teamed-up version could beat an unassisted AI.

I don’t know exactly how long that stayed true for, and people loved that fact, but it didn’t stay true for long. The humans started making the AI worse than the AI was playing alone as it got smarter. I think we will learn that we’re just not that smart. The size of a human brain has all of these biological limitations, but you can make a really big computer with very fast interconnects between the chips.

This brings us to the idea that as a thing improves, there may be pitfalls, or valleys, where the improvement can actually get worse depending on the human’s perception or interaction with it.

This is well known in the field of robots and 3D animation where it is called the “uncanny valley.” It hypothesizes that as the appearance of an object gets closer and closer to a human, it goes through a valley where humans become repulsed or disturbed by its appearance.

This repulsion continues until the appearance improves enough and the unpleasant perception gives way to a very human-like appearance.

Chart of the Uncanny Valley of Robotics

At the core of uncanny valley is this: human perception. This isn’t too far from the reality we are in now with computer assistance. Given humans can misperceive the physical resemblance of an object to a human, it could also follow that a human will misperceive the intelligence of a computer or system relative to that of their own intelligence.

This can happen in two ways:

  • A human might design the computer to ask the human for input and the computer might continue to ask even if its capability exceeds the human’s. In this case, the computer is designed to babysit the human by the developers.
  • A human might override the decisions that a computer might make, even if the computer outperforms the human. In this case, the human thinks it is babysitting the computer.

In both of these cases, when a computer is designed to ask for human input, or a human assumes it knows more than the computer, it can perform worse on a task compared to if the computer just operated on its own volition.

It’s this “nannying” that humans feel necessarily to do with complex computer software, which just ends up putting the outcome in a suboptimal, local minimum.

This becomes the “nanny valley” and looks like the following:

Chart of the Nanny Valley of AI

I suspect that this valley occurs on the curve at different locations depending on the task at hand (chess, flying, driving) as well as the skill of the human involved.

If Magnus Carlsen used a computer to assist him with chess, it is a lot different than if I were to use a computer. I know the rules and can think a few moves ahead but don’t know any strategy or tactics. I would probably have the same performance if I were to instead roll a die and randomly pick one of the computer’s options.

In the case of the latest 737 MAX crash, we still don’t know exactly what happened but it’s clear in the past that trade-offs were made in the design of the MCAS to change the way the aircraft handled and it was abstracted away behind software that would need to be disengaged if issues arose.

As we develop more advanced software that starts to automate away tasks that humans used to do such as driving planes, driving cars, factory work, and many other tasks, this will always be a problem we will face.

This tragedy illustrates an important point about the interaction between humans and computers: a human’s perception of their own skill relative to a computer’s can sometimes cause us to make worse decisions. This choice can leave us all in this “nanny valley” and as a result, we are all less safe.

On Live Coding

Livestream coding

This past week Suz Hinton wrote about her Twitch streaming setup and it is a great article. She also links to a past article from 2017 that explains her motivation and inspiration. Live coding is a trend I’ve been following for a variety of reasons.

Before I get into it though, I want to talk about a book I’ve been reading and how it relates. I spent most of yesterday reading through a book by Martin Campbell-Kelly on the history of software.

It’s called From Airline Reservations to Sonic the Hedgehog: A History of the Software Industry and it starts from the very early days of the software industry in the 1950s and follows it up to 1995 when Microsoft was skyrocketing in value and dominance within the industry.

I’m in awe of Campbell-Kelly’s ability to synthesize all of this information and to tell such a compelling tale of software’s past. I highly recommend it as a read.

Here’s one part of the book that struck me when thinking of the live coding phenomenon:

In December 1955, the RAND Corporation created an autonomous Systems Development Division to undertake the programming work. At that time, the corporation reckoned that it employed about 10 percent of the top programmers in the United States—but this amounted to only 25 people. It was estimated that there were no more than 200 “journey-man” programmers—those capable of the highest class of development work—although there were probably 6 times that number of professional programmers working on relatively simple commercial applications.

In just 64 years, (or should I say 2^6 years?) we’ve gone from there being just a few hundred programmers in the world to now where a person can live stream coding and amass over 200 viewers quite easily.

It is a testament to how “software is eating the world” and the increasing importance of software engineers.

Twitch’s Full Circle

When Twitch launched, it was a spin-out of the gaming section of Justin.tv which had become the largest category on the site.

Justin.tv tried to be general by providing a place for anyone to stream anything. But by doing so, it almost missed that the most engaged and lucrative part of the live streaming market was in video games. By focusing on the most dedicated group of users: gamers, it was able to build something that worked better for them. It’s a great example of monetizing the audience that cares the most rather than trying for mass market appeal.

The best part of Twitch’s story is that in early 2017, it launched its IRL section which is just a Justin.tv reincarnate and allows people to stream anything and thus coming full circle to what Justin.tv set out to be.

Live Coding

Live coding is fairly niche but has a very devoted fan base. At any given time you can find a few dozen people live coding on the Science & Technology channel on Twitch. This channel was a spin-out of the IRL channel which was broken up into a few different categories last year.

The most common type of live coding is game development but occasionally there are people like Hinton that branch into other types of development as well.

There are a few really excellent reasons why people live code and why it is engaging to watch:

  1. It shows the coding/problem solving process and breaks down the myth of the genius programmer.
    • Programming is often glorified into being something done in a dark basement by someone super smart that can solve a problem instantly. This couldn’t be anymore wrong and seeing people struggle with problems and solve them in realtime does wonders to break this myth.
    • Just think of how many high schoolers might pick up coding after they stumble upon a live coder and see how accessible and fun it can be.
    • One particular streamer that is really great at this is Naysayer88 aka Jonathan Blow of Braid and The Witness fame. He documents his game development as well as more abstract computer science topics like building a compiler.
  2. It’s a new form of marketing for games and projects.
    • This is a very interesting angle to live coding that Adam13531 is focusing on. He has written extensively about this on his blog for the game he is developing called Bot Land and has live streamed so far 637 days of development.
    • His reasoning is that Twitch is full of gamers so by documenting his process of making the game, he’ll find people that can improve the gameplay by finding bugs as well as promote it when it is finished.
  3. It provides a community.
    • If you spend a few minutes in a stream, you start to notice that there’s often a real connection with the streamer and their audience. It is a place where coders can go to talk with other coders and discuss technical things.

Why It Is Important

People like Suz Hinton, Adam13531, and Jonathan Blow are showing what live coding can look like and what benefits it can have. They continue to push the boundary of this new way of engaging with people as a developer.

Live coding shows what the problem solving process looks like, the good and the bad. It also provides a warm and friendly environment for people.

This is especially important in the field of software engineering where diversity is a problem and it can be obtuse and unwelcoming. Together these make live coding an important thing to the field and an important trend to follow.

Shades of Steve

Today it was announced that Rich Barton, one of the original founders of Zillow, is returning to run the company after the previous CEO, Spencer Rascoff, stepped down after having ran it for the last 9 years.

Bloomberg ran an article on this and here are some key parts:

Chief Executive Officer Spencer Rascoff is handing the reins to Executive Chairman Rich Barton, 51, who co-founded the company in 2005. Zillow, whose shares are down 46 percent since June, announced the move along with a quarterly earnings report that shows short-term results below analysts’ expectations. Shares rose 8 percent in post-market trading after earlier declining as much as 9 percent following the announcement.

Changing the company’s leadership “so that my voice is out at front during the period of extreme evangelism seemed like the right thing,” Barton, who was also a founder of travel site Expedia Group Inc., said in an interview. “I’m hoping our investors think it’s the right thing. I know many of our investors, they’ve known me from the beginning, they’ve invested with me with Expedia and other companies as well. They’re used to me pointing at the moon and saying, ‘I want to go step on that thing.’”

This is fascinating to me for a couple of reasons:

  1. Zillow has been fundamentally changing their business model. More details of this in this excellent analysis by Ben Thompson of Stratechery.
  2. There’s significant push back on the changes Zillow has made in the last year from institutional investors through the stock price and maybe even internal company culture.

I can’t help but make the connection to Steve Job’s return to Apple in 1997.

Given Apple and Jobs went on to create the iPhone, the biggest technological product the world has ever seen and thereby changing the course of technological progress, the comparison might not be fair but the impact of housing prices on America is very much real and does have huge costs to the economy. It represents a big opportunity for Zillow in pulling this off and makes sense they are looking to Rich Barton.

When Steve Jobs returned, he was possibly the only person that could stand on stage with Bill Gates looming in the background at that 1997 Macworld Expo announcing the infamous deal with Microsoft to bring back Apple from the brink.

Few people other than founders have the ability to instill confidence in their companies and change how things work. Zillow must be thinking about this.

The most obvious good news is Zillow didn’t look to a villain like another company that has a presence in the Seattle area did.