Just last Wednesday, I posted a column reporting how our richest corporations, through sheer miserliness and profit-seeking, left millions of Americans vulnerable to technological attacks on their privacy and welfare.
I failed to raise one important question: What if the attacks come from inside the house?
That’s exactly what happened Friday. An ineptly designed update to a program rolled out by the cybersecurity company CrowdStrike and installed automatically on users’ machines instantly crashed millions of computers running Microsoft programs and left them disabled until manual fixes could be undertaken. Some haven’t been fixed yet.
Crowdstrike seemingly borrowed Boeing’s approach to quality control.
— Business blogger Ed Zitron
The fallout reached worldwide and affected people across the modern technological landscape. Thousands of flights were canceled. Doctors couldn’t perform surgeries. Banking transactions were frozen. Emergency 911 lines went silent.
The affected computers displayed what Microsoft Windows users know as the dreaded “blue screen of death.” Typically, this is a baby-blue screen bearing the message that Microsoft’s operating system hadn’t loaded correctly and the machine should be restarted.
Newsletter
Get the latest from Michael Hiltzik
Commentary on economics and more from a Pulitzer Prize winner.
You may occasionally receive promotional content from the Los Angeles Times.
That didn’t work this time: The errant CrowdStrike application was burrowed so deep within the Microsoft operating system — as it’s designed to do — that every time a machine restarted, it ran into the same glitch and went dead again in an infinite doom loop.
The CrowdStrike program — irony of ironies — is an anti-hacking application that identifies hacking attempts and fights them off. In the cat-and-mouse game pitting computer users against hackers, such applications have to be updated regularly. They reside in the bowels of the operating system, because in order to be effective, they have to load before almost any other function.
In this case, a coding error in the update delivered an order to the operating system that caused the system to shut down.
That’s a simplified explanation of what happened. Now let’s look at the lessons this episode teaches us — if we’re willing to learn them.
They have to do with our complacency about our dependence on digital systems, including those distributed by developers we’ve never heard of (CrowdStrike, for instance).
What few people are aware of as they go about their lives is how much crucial digital infrastructure is based on Microsoft programs and applications, and how much of those are supplemented by third-party programs and applications.
All of this must work together to work smoothly — or to appear to work smoothly. Here and there something goes wrong, but its ramifications are sufficiently constrained that it can be rectified quickly, and even invisibly.
A great deal of it, furthermore, is automated; it’s designed to run with a minimum of human intervention. In the view of the IT departments that are expected to monitor all this, humans are perpetual money pits — they need days off, get sick, demand raises, quit and must be replaced by newbies needing training, etc., etc. By comparison, machines look like a one-time capital expense — set it and forget it, is the goal.
Microsoft is the hub of these networks because Microsoft made them its business. It created an open architecture for third-party developers to piggyback on; the fundamental idea was that by extending the system’s capabilities, those other developers made Microsoft’s central system more valuable. Microsoft either outsourced some functions to independent developers, or allowed them to design applications that competed with Microsoft’s versions — but those still were designed to work with Microsoft operability.
Among those developers is Austin, Texas-based CrowdStrike, one of countless firms offering cybersecurity services to Windows users. (Microsoft’s own cybersecurity suite is known as Defender.)
Apple computers and devices don’t have the same vulnerabilities because that company does almost all its extensions in-house, and keeps a very close eye on what it allows to interact with its software and hardware; the company doesn’t allow outside applications to interact with its operating system at the fundamental level available with Microsoft’s systems.
But Apple doesn’t have anywhere near as large a footprint in enterprise services as Microsoft. A report issued in March by the government’s Cyber Safety Review Board about a major hacking intrusion into Microsoft’s cloud system in March 2023 asserted that the company’s “ubiquitous and critical products … underpin essential services that support national security, the foundations of our economy, and public health and safety.”
Anyone living in the modern world has to confront the drawbacks of our reliance on digital technology on almost a daily basis. In prehistoric days, back when our household appliances were mechanical or electric, not electronic, a breakdown was easy to diagnose and fix — switch out a tube or tighten a screw.
When a device ceases to function today, it’s often impossible to pinpoint the fault — did my TV go bad, or did the internet go down, or was it just the channel I was watching?
Yet many of us rely on a single company for multiple services. For example, I get my home phone service, broadband internet, and television/video (broadcast and cable channels and streaming) from a single provider. I don’t have much choice, since for most of these it’s the only provider in my neighborhood. But when it goes down, everything goes down.
That provider, Spectrum, has tried to sell me on its mobile phone service too. I’ve refused, because I figure I need at least one thread of access to the outside world that isn’t dependent on its all-in-one monopoly.
Microsoft’s near-dominance of cloud computing — the ecosystem through which all those enterprise computers that went dead last week communicate with each other and with the outside world — should make all of us queasy, because the company’s cybersafety record is atrocious.
The Cyber Safety Review Board investigation concluded that the March 2023 hack occurred because “Microsoft’s security culture was inadequate and requires an overhaul, particularly in light of the company’s centrality in the technology ecosystem and the level of trust customers place in the company to protect their data and operations.”
The board mentioned, among other things, a “cascade of … avoidable errors” in the company’s cybersecurity program, its failure to detect the compromise by hackers of its own “cryptographic crown jewels,” but only acted after a customer — the U.S. State Department — discovered the incursion itself.
The board found that Microsoft’s security practices were inferior to those of “other cloud service providers.” The report mentioned Amazon, Google and Oracle as Microsoft rivals in cloud services with better security systems.
Microsoft pledged to “adopt a new culture of engineering security in our own networks” and said it had “mobilized our engineering teams to identify and mitigate legacy infrastructure, improve processes, and enforce security benchmarks.”
The CrowdStrike crash suggests that those efforts are still works in progress. It’s fair to say that much of the blame belongs to CrowdStrike, which allowed an update to a crucial application to be sent to users for automatic installation without doing the testing necessary to ensure that the update was operationally bulletproof.
Technology blogger Ed Zitron properly tied the disaster to the financialization of Big Business generally, in which pumping ever higher profits to shareholders becomes a higher priority than ensuring that one’s products meet quality standards.
“Crowdstrike seemingly borrowed Boeing’s approach to quality control,” Zitron wrote, “except instead of building planes where the doors fly off at the most inopportune times (specifically, when you’re cruising at 35,000ft), it released a piece of software that blew up the transportation and banking sectors, to name just a few.”
CrowdStrike Chief Executive George Kurtz moved promptly to “sincerely apologize” to all affected users, via a statement and an appearance on the NBC “Today” show. “We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority,” Kurtz said in a posting on the company’s website.
Microsoft placed the blame chiefly on CrowdStrike. “Although this was not a Microsoft incident, given it impacts our ecosystem, we want to provide an update on the steps we’ve taken with CrowdStrike and others to remediate and support our customers,” David Weston, a vice president for enterprise and security, wrote on the company’s website.
But Microsoft, plainly, failed to take on board the necessity of vetting every piece of third-party software that could have an effect on its own customers — before it blew up their computer systems.
No software system is immune from errors, especially now that they’re so complex and multilayered that not even their developers may know all their weak spots. (An error at Amazon’s cloud service incapacitated as many as 150,000 websites for several hours in February 2017 — a major problem, but not nearly on the scale of the CrowdStrike crash.)
But as these systems play an ever expanding role in modern life even as they become more complex, it’s incumbent on their providers to make security and safety their top priorities, not merely mouth the concept in marketing material without actually taking it seriously.
Cloud clients also need to pay more attention to what is getting automatically inserted into their systems. Who has the right to gloat over escaping the CrowdStrike meltdown last week? Amusingly, it’s Southwest Airlines. For decades, Southwest resisted Microsoft’s urgings that it upgrade its systems to the latest versions of Windows, relying on Windows 3.1, which is 32 years old — so antique that the CrowdStrike update wouldn’t even work on the airline’s systems.
So while affected carriers such as Delta, United and American had canceled nearly 2,400 flights by 6 p.m. Friday, Southwest had canceled three. (By midday Monday, the number of canceled flights reached beyond 12,300.) That doesn’t mean that Southwest gets everything right. After all, the airline suffered more than its competitors from the ferocious storm in December 2022 that snarled air traffic nationwide — precisely because it had not paid enough attention to keeping its computer systems updated.
In this case, however, Southwest’s cheapskate culture was its savior. That may only put it on the same level as the proverbial blind squirrel that occasionally finds a nut. But it shows that all of our Big Business squirrels need to keep their eyes open, and focused on the perils of inattention.