MacTalk
August 2024
What Should Apple Users Take Away from the CrowdStrike Debacle?
Apple and Linux machines were not affected by the CrowdStrike software update.
Even while sympathizing with those directly and indirectly affected, it’s hard not to be a little smug. The larger question is, could a similar kind of problem affect Macs? That would be bad for us Mac users but less so for the world, given that Macs are used in fewer mission-critical situations than Windows-based PCs. Macs may not even be as relied upon as iPads for vertical market tasks like point-of-sale applications, medical record tracking, and education management. What about iPhones? I have less of a sense of how mission-critical they are to businesses and other organizations, but there are certainly millions of individuals whose lives would be upended if their iPhones were suddenly bricked. They would have trouble communicating with others, making purchases, navigating to unfamiliar destinations, taking public transit, and much more.
At The Eclectic Light Company blog, Howard Oakley examines the possibility of Macs being affected by something similar. He concludes that the likelihood is quite small overall and no longer significant for Apple silicon Macs. On Windows, CrowdStrike’s Falcon sensor code runs as a kernel-mode driver with elevated privileges, which is why its bug can prevent a PC from booting successfully. On the Mac, the equivalent approach would require a kernel extension (kext), but Apple deprecated kexts starting in macOS 10.15 Catalina in 2019, pushing developers to use System Extensions instead. Kernel extensions can run on Apple silicon Macs only if the user drops system security to Reduced Security and explicitly allows third-party kexts to load. Don’t do that unless you have a really good reason.
In fact, the Mac version of CrowdStrike’s Falcon sensor reportedly used a kext on Intel-based Macs prior to macOS 11 Big Sur but has since switched to an EndpointSecurity System Extension. System Extensions run with standard user privileges, so even if one suffered from a critical bug, it shouldn’t be able to cause a kernel panic.
What about iOS and iPadOS? They’re even more secure than macOS because they have never allowed kernel extensions and don’t support anything like macOS System Extensions. All iOS and iPadOS apps are sandboxed, so they generally can’t affect the system or any other app. That’s not to say that iOS and iPadOS are perfectly secure or reliable, but they certainly rank highly among consumer-grade operating systems.
Apple devices may not be as vulnerable to a bug in an update to third-party software like CrowdStrike, but that doesn’t mean we can be complacent. Apple itself regularly releases updates, and while it’s essential to install them to patch security vulnerabilities, Apple’s engineers could make a mistake that would cause problems for millions. Howard Oakley’s article reminded me of when an Apple update inadvertently disabled Ethernet (see “El Capitan System Integrity Protection Update Breaks Ethernet,” 29 February 2016). Apple quickly addressed the problem, but the lack of Ethernet prevented some Macs from getting the revised update, requiring manual intervention.
What could happen to reduce the chances of an outage like this happening again?
- Punish CrowdStrike? It’s tempting to say that CrowdStrike should somehow be held liable for the potential costs, estimated to exceed $1 billion. However, in most cases, CrowdStrike’s standard terms limit the company’s liability to a refund for fees paid. There may still be shareholder lawsuits—CrowdStrike’s stock fell nearly 20%—and SEC scrutiny. Overall, it was a terrible, horrible, no good, very bad day for CrowdStrike, but it almost certainly doesn’t mean the end of the company. Other firms will probably be more careful for a while, but if the mistake doesn’t prove hugely expensive for CrowdStrike, everyone may stick with current bad behaviors.
- Write better code? The easy answer is that the team in charge of developing the update should never have made the mistake in the first place. CrowdStrike’s environment and policies are unknown, but there are programming practices that reduce the likelihood of such errors. In an ideal world, more attention would be paid to code quality, but it can be difficult for management to prioritize code quality over shipping more quickly.
- Do better testing? Even if we give CrowdStrike the benefit of the doubt and say that the bug was a subtle mistake that could have slipped by any developer, I can’t see any excuse for why it wasn’t caught in testing. Either CrowdStrike wasn’t doing real-world testing—the company constantly releases updates like this—or someone messed up big time. As with writing better code, better testing is something everyone can agree should happen, but test teams may not be given the time or resources they need to do a good job.
- Use a staged rollout? Companies that release updates to very large numbers of users don’t usually do so all at once. Instead, they release to small groups before expanding to the entire user base. That way, even if a bug has been introduced in development and slipped through testing, it won’t affect too many people before being discovered. Either CrowdStrike didn’t do this, or the problem affected only a subset of CrowdStrike users, so it could have been much worse. The only reason I can see why a company wouldn’t use staged rollouts is if it was patching a zero-day security vulnerability and felt that it was crucial to distribute to everyone as quickly as possible.
- Switch to Macs, iPads, and iPhones? It’s good to have an active fantasy life. The kinds of inexpensive workhorse computers and servers affected by the CrowdStrike bug are exactly what Apple isn’t interested in building.
Plenty of other lessons could be taken away from the CrowdStrike debacle, but I worry that it will fall out of the headlines too soon for other companies to learn from CrowdStrike’s mistakes.
Contents
Website design by Blue Heron Web Designs