Issue #55: The CrowdStrike Outage Explained — Jason Michael Perry

Howdy👋🏾. On Friday,  my friends and family started sending me questions via text. This generally happens when some large-scale technology event has occurred, and all at once, everyone needs clarity on what exactly is happening. I figured I would take some time for this week’s newsletter and explain what exactly happened, where we are now, and why it’s such a big deal.

CrowdStrike is a cybersecurity company – and a big one. Its platform is used by most Fortune 500 and 5000 companies. One of its products, the Falcon platform, is built to prevent attacks by detecting them before they happen. It does this by running a low-level application that is constantly updated with new information on threats to look for. I picture it as CrowdStrike HQ sending memos to every computer it manages with a most-wanted poster listing the newest attack vectors or threats.

Friday, a routine update to millions of computers, had a glitch that forced certain Windows machines—a specific version of Windows 7 and higher—to restart and fail to reboot successfully, ending with what is fondly referred to as the Blue Screen of Death (BSOD).

image credit: The Verge

For CrowdStrike to provide such powerful endpoint protection (endpoint is a name for all the computers at a company), it must run its software on everything. This means that for many enterprises, every computer in an organization potentially has this software installed. This includes user workstations but also all of the computers you might forget exist—machines that power the screens at our airports, the computers in our ATMs, the machines embedded in medical devices, production servers that run internal and external applications, courtroom servers, or machines that power robots in warehouses.

As Microsoft put it, 8.5 million machines appear to have been impacted worldwide, which, crazy enough, represents only 1% of its overall install base.

In a perfect world, CrowdStrike would fix the glitch they shipped to these computers and update them. The problem is that the error prevents these machines from fully booting into Windows, so they can’t access CrowdStrike’s services. The fix is to manually reboot each of these machines into a special safe mode, physically find and delete files, and then reboot the machines—a process that, for large companies with thousands of impacted machines, could take days.

Look no further than companies like Delta. It’s Wednesday, and the company is still struggling to get its systems back online, continuing to cancel flights as I’m sure an army of IT workers and contractors work overnight to get these computers up and running.

image credit: Reddit

The irony is that software like CrowdStrike exists to prevent exactly the type of outage they’re now responsible for. It’s also a big reminder of just how vulnerable we are to the vendors we use. More on this in a moment, but first, my sponsors and thoughts on tech & things:


🤝 This week’s newsletter issue is proudly sponsored by:

If you are looking to find qualified candidates, contact Baird Consulting.


🚀 How Will Global Copyright Rules Shape AI Development? Ever wonder how copyright rules might impact AI innovation across the globe? Japan’s approach could be a game-changer, creating a stark contrast to the EU’s heavy-handed regulations. Dive into the nuances and implications.
Read more

⚙️ CrowdStrike Reminds Us That Dependency Management Is a Major Attack Vector – When a cybersecurity giant like CrowdStrike stumbles, it’s a harsh reminder of the fragility of our dependency management. Explore the impact of their latest glitch and what it means for businesses worldwide.
Read more

🔍 OpenAI Introduces ChatGPT 4o Mini – OpenAI’s latest foray into Small Language Models with the ChatGPT 4o Mini promises powerful local AI capabilities with faster response times and improved privacy. What could this mean for the future of AI-powered devices?
Read more

🔮 Is Meta’s Multi-Token Prediction Model a Game Changer? Meta’s new Multi-Token Prediction Model could revolutionize how we interact with AI. Could this be the breakthrough that propels us into a new era of AI capabilities?
Read more

🔐 Should Governments Have Backdoor Access to Encrypted Devices? In the wake of another high-profile incident, the debate over government backdoor access to encrypted devices resurfaces. Weigh in on the balance between security and privacy.
Read more

🧠 OpenAI Reveals the 5 Steps to AGI OpenAI’s roadmap to Artificial General Intelligence outlines five critical levels. Discover what each level entails and how ChatGPT is progressing through them.
Read more

CrowdStrike is doing its best to take a manual process and make it easier on its clients. Monday, it released a bit of software IT companies can place on a USB drive to help speed up the process.

One unexpected issue is the now-normal process of computers encrypting their hard drives and requiring a key to decrypt them, sometimes managed by a certificate server. This unique situation can be painful, preventing some companies from having permission to access or delete files when rebooting the Windows machine in safe mode. For these businesses, some less critical systems need to be prioritized, or in the worst cases rebuilt or restored from backups before they can access the certificates needed to restore other machines.

The issue did not impact Linux and macOS machines because CrowdStrike’s agent works with a very low-level piece of an operating system called a kernel, which Linux and macOS essentially prevent access to. Some have debated whether or not Microsoft allowing access to this is a good thing.

What is obvious is that once the dust settles, many CIOs and security professionals will need to take a step back and examine how they handle threat detection, and if they’re willing to allow external vendors to directly make updates to mission-critical or production systems in the name of maybe safeguarding them from a new intrusion.

Needless to say fixing such a huge outage, with such a broad footprint will take weeks, so go easy on your IT friends and family for the next month or two; they’re going through it right now and burning the midnight candle to help us.

-jason

p.s. If you like CrowdStrike Falcon, then you will love this company that offers No Uptime Hosting! They’ll do everything to keep your site down and make sure you get the worst customer service while doing it! Kinda sad that the website is actually up.

No Uptime Hosting – Guaranteed Server Downtime!

p.s.2 If you needed an excuse to skip work, well you have one

https://xkcd.com/2961