I just read the latest news about the 737-Max jet that crashed in Ethiopia. Just like that other one in Indonesia, the damn thing started porpoising up and down minutes after takeoff while accelerating insanely, then nose dived into the ground and killed all aboard. This is not a thing that jumbo jets just randomly do when they get hit by a bad gust of wind. And the chances of two of the same model of plane exhibiting the same possessed-by-the-devil behavior in 5 months purely by chance is infinitesimal. Something in the millions of lines of code controlling those planes went down a horribly wrong path. And of course most of the time it doesn't do that, but that wrong path is still in there somewhere and it will keep getting executed periodically until someone tracks it down. It's one of those incredibly frustrating unreproducible bugs that are the bane of the software tester's existence.
But this bug didn't just corrupt a file, or open up a security hole in a data center, or even sabotage a medical device and accidentally kill a handful of patients. This one crashes jumbo jets nose-first into the ground with screaming pilots yanking futilely at the controls. This is why aerospace test cycles are 18 months to 4 years in length - because 99.9% zero defects isn't enough. If there is a more damaging, terrifying place for a tiny little critical software defect to work its wiles, I can't think what it might be. My heart goes out to not only the families of the dead, but to the hundreds of programmers and testers that worked on that system who are lying awake wondering if it was their fault.
And fuck you, Boeing. Yeah, I know you didn't do it on purpose but it's your fucking bug and you need to own it. None of this shit about how you had to *sigh* ground all those totally perfectly 100% safe planes "out of an abundance of caution and in order to reassure the flying public." The problem is not with the overly hysterical flying public, guys.
Yes, you guessed it, I was a software tester. Not in aerospace or medical devices, just in networking where lives do not usually hang on a single bug. But I did spend a whole weekend at a software conference in Oregon that was almost entirely devoted to the daunting task of figuring out how to test the humongous flying computer network that was the Boeing 777, then under development. It made an impression on me. As far as I know, not one of those jets ever went down due to a bug like this, so I guess they figured it out.
But this bug didn't just corrupt a file, or open up a security hole in a data center, or even sabotage a medical device and accidentally kill a handful of patients. This one crashes jumbo jets nose-first into the ground with screaming pilots yanking futilely at the controls. This is why aerospace test cycles are 18 months to 4 years in length - because 99.9% zero defects isn't enough. If there is a more damaging, terrifying place for a tiny little critical software defect to work its wiles, I can't think what it might be. My heart goes out to not only the families of the dead, but to the hundreds of programmers and testers that worked on that system who are lying awake wondering if it was their fault.
And fuck you, Boeing. Yeah, I know you didn't do it on purpose but it's your fucking bug and you need to own it. None of this shit about how you had to *sigh* ground all those totally perfectly 100% safe planes "out of an abundance of caution and in order to reassure the flying public." The problem is not with the overly hysterical flying public, guys.
Yes, you guessed it, I was a software tester. Not in aerospace or medical devices, just in networking where lives do not usually hang on a single bug. But I did spend a whole weekend at a software conference in Oregon that was almost entirely devoted to the daunting task of figuring out how to test the humongous flying computer network that was the Boeing 777, then under development. It made an impression on me. As far as I know, not one of those jets ever went down due to a bug like this, so I guess they figured it out.
no subject
Date: 2019-03-15 11:30 pm (UTC)Nuclear counterstrike.
I sincerely hope that no country has ever left couterstriking entirely up to an automated system, but I don't know if that's true. At least the Soviet's didn't, because Stanislav Petrov had a chance to decide not to blow up the world (https://thebulletin.org/2018/09/a-posthumous-honor-for-the-man-who-saved-the-world/).
It's not a spurious parallel. In each case, the system got bad data indicating that an extreme action needed to be taken. A human needs to be in the loop when this happens, and as I understand the Max-8 situation, better-trained pilots knew how to get themselves into the loop (by disabling the autopilot), but that's not good enough.
no subject
Date: 2019-03-16 03:14 am (UTC)In some ways the amazing thing is that this hasn't happened before. The 777 that I had a passing relationship with was incredibly, immensely complicated - an order of magnitude more complicated than previous airplane software. But I looked it up - that plane has had a remarkable safety record since it deployed in 1995. A few planes have crashed, but no two for the same reason and none because of software malfunction (one was shot down by the Russians, which really shouldn't count). The first serious crash of a 777 was in 2013 as the fleet was starting to age. The 737-Max didn't make it to 6 months without two devastating failures. This is not a good trend.
I'm afraid that the reason this is happening now is because the developers have gotten too cocky - they really believe that their software is now so much smarter than human beings that they decided to leave humans out of the loop deliberately. Putting in a poorly documented workaround to disable the suicide bug when it occurs is not exactly a solution. This does not bode well for self-driving cars.
no subject
Date: 2019-03-16 05:06 pm (UTC)The neutrino situation was that we have this super fancy deep neural net machine learning algorithm that classifies events. It does pretty well in general, but we also found that it accepted one incredibly obvious non-neutrino as a neutrino, and labeled it as such with high confidence. The reason? There wasn't anything like it in the training sample. Just like, as I hazily recall, how the self-driving car classified a pedestrian as some-junk-I-can-drive-into.