In last week’s IT chaos, caused by a bug in an anti-hacking software package, some people thought only PCs running Windows were being hit with the BSOD (Blue Screen of Death). It turns out that CrowdStrike’s Falcon program has been doing the same to Linux systems, taking down client and server machines.
Considering there were a lot of smug posts going around on Friday from a bunch of folks on their Linux systems, the fact that it’s not just a Windows thing is certainly worth noting.
The news of CrowdStrike’s woes not being limited to Windows installations was reported by the Register and it confirms what was already suspected in last week’s IT outages that spanned the globe—it wasn’t a Windows problem at all but entirely down to a separate piece of software. The application in question, CrowdStrike Falcon, is basically an anti-hacking/malware package used by countless businesses, large and small, and government institutions and services.
A bugged update to the program caused Windows PCs to undergo a stop error, better known as a BSOD (Blue Screen of Death), that just kept on reoccurring with each boot attempt. Microsoft has swung into action and created a recovery tool to help solve the affected computers and CrowdStrike’s CEO, George Kurtz, was very apologetic about the whole incident.
But behind the news headlines, all displaying endless pictures of BSODs, was the less-reported fact that Linux systems were also being affected by Falcon bugs, though in one instance it predates last week’s issue by a month. RedHat identified CrowdStrike’s software as being the source of a kernel panic (the Linux equivalent of a Windows stop error), and the Register notes that earlier Falcon updates have done the same in Debian and RockyLinux.
Software bugs are so common that anyone using a computer just accepts them as being part and parcel of the modern IT world. But there is a big difference between an application having a few glitches and one that causes the operating system’s kernel to bail out. And given how widely used CrowdStrike’s software is, that difference is even more important.
I’ve never been in the position of having to manage a huge network of computers, providing a mission-critical service, but I have looked after a few small ones in the days when the stability of Windows and its updates was really flaky. For those, I only ever pushed an update onto one test machine, leaving the rest of the network on a previously-tested update, to make sure no change would leave the whole system unusable.
I should imagine that this is common practice but after seeing the level of impact that the Falcon update had on Friday, it’s perhaps not as common as I think. I’m not suggesting that the problem was, in part, the fault of IT system managers (the finger of blame is firmly pointing at CrowdStrike), but I can’t help but feel that if you’re managing a system that cannot go down for any reason, then you never let an update get rolled out without testing it first.
Whether the CrowdStrike outage goes down as being the worst in history is yet to be determined but I’m pretty sure about some things, though. CrowdStrike’s market value is going to tank hard and IT managers are going to be very wary about its software in future.