Most Popular

How to prevent your software update from being the next CrowdStrike

Ron Miller

Updated July 24, 2024 at 8:10 AM·6 min read

CrowdStrike released a relatively minor patch on Friday, and somehow it wreaked havoc on large swaths of the IT world running Microsoft Windows, bringing down airports, healthcare facilities and 911 call centers. While we know a faulty update caused the problem, we don’t know how it got released in the first place. A company like CrowdStrike very likely has a sophisticated DevOps pipeline with release policies in place, but even with that, the buggy code somehow slipped through.

In this case it was perhaps the mother of all buggy code. The company has suffered a steep hit to its reputation, and the stock price plunged from $345.10 on Thursday evening to $263.10 by Monday afternoon. It has since recovered slightly.

In a statement on Friday, the company acknowledged the consequences of the faulty update: "All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority."

Further, it explained the root cause of the outage, although not how it happened. That’s a post mortem process that will likely go on inside the company for some time as it looks to prevent such a thing from happening again.

Dan Rogers, CEO at LaunchDarkly, a firm that uses a concept called feature flags to deploy software in a highly controlled way, couldn’t speak directly to the CrowdStrike deployment problem, but he could speak to software deployment issues more broadly.

“Software bugs happen, but most of the software experience issues that someone would experience are actually not because of infrastructure issues," he told TechCrunch. "They're because someone rolled out a piece of software that doesn't work, and those in general are very controllable.” With feature flags, you can control the speed of deployment of new features, and turn a feature off, if things go wrong to prevent the problem from spreading widely.

It is important to note however, that in this case, the problem was at the operating system kernel level, and once that has run amok, it’s harder to fix than say a web application. Still, a slower deployment could have alerted the company to the problem a lot sooner.

What happened at CrowdStrike could potentially happen to any software company, even one with good software release practices in place, said Jyoti Bansal, founder and CEO at Harness, a maker of DevOps pipeline developer tools. While he also couldn’t say precisely what happened at CrowdStrike, he talked generally about how buggy code can slip through the cracks.

Typically, there is a process in place where code gets tested thoroughly before it gets deployed, but sometimes an engineering team, especially in a large engineering group, may cut corners. “It's possible for something like this to happen when you skip the DevOps testing pipeline, which is pretty common with minor updates,” Bansal told TechCrunch.

He says this often happens at larger organizations where there isn’t a single approach to software releases. “Let's say you have 5,000 engineers, which probably will be divided into 100 teams of 50 or so different developers. These teams adopt different practices,” he said. And without standardization, it’s easier for bad code to slip through the cracks.

How to prevent bugs from slipping through

Both CEOs acknowledge that bugs get through sometimes, but there are ways to minimize the risk, including perhaps the most obvious one: practicing standard software release hygiene. That involves testing before deploying and then deploying in a controlled way.

Rogers points to his company’s software and notes that progressive rollouts are the place to start. Instead of delivering the change to every user all at once, you instead release it to a small subset and see what happens before expanding the rollout. Along the same lines, if you have controlled rollouts and something goes wrong, you can roll back. “This idea of feature management or feature control lets you roll back features that aren't working and get people back to the prior version if things are not working.”

Bansal, whose company just bought feature flag startup Split.io in May, also recommends what he calls “canary deployments,” which are small controlled test deployments. They are called this because they hark back to canaries being sent into coal mines to test for carbon monoxide leakage. Once you prove the test roll out looks good, then you can move to the progressive roll out that Rogers alluded to.

As Bansal says, it can look fine in testing, but a lab test doesn’t always catch everything, and that’s why you have to combine good DevOps testing with controlled deployment to catch things that lab tests miss.

Rogers suggests when doing an analysis of your software testing regimen, you look at three key areas — platform, people and processes — and they all work together in his view. “It's not sufficient to just have a great software platform. It's not sufficient to have highly enabled developers. It's also not sufficient to just have predefined workflows and governance. All three of those have to come together,” he said.

One way to prevent individual engineers or teams from circumventing the pipeline is to require the same approach for everyone, but in a way that doesn’t slow the teams down. “If you build a pipeline that slows down developers, they will at some point find ways to get their job done outside of it because they will think that the process is going to add another two weeks or a month before we can ship the code that we wrote,” Bansal said.

Rogers agrees that it’s important not to put rigid systems in place in response to one bad incident. “What you don't want to have happen now is that you're so worried about making software changes that you have a very long and protracted testing cycle and you end up stifling software innovation,” he said.

Bansal says a thoughtful automated approach can actually be helpful, especially with larger engineering groups. But there is always going to be some tension between security and governance and the need for release velocity, and it’s hard to find the right balance.

We might not know what happened at CrowdStrike for some time, but we do know that certain approaches help minimize the risks around software deployment. Bad code is going to slip through from time to time, but if you follow best practices, it probably won’t be as catastrophic as what happened last week.

Engadget
CrowdStrike blames bug that caused worldwide outage on faulty testing software
CrowdStrike has blamed faulty testing software for a buggy update that crashed 8.5 million Windows machines around the world.
Engadget
CrowdStrike offered a $10 Uber Eats card to teammates and partners, but it got flagged for fraud
CrowdStrike offered some of its clients a voucher for Uber Eats to apologize for the global shutdown but some found the offer had been pulled when they tried to redeem it.
TechCrunch
CrowdStrike offers a $10 apology gift card to say sorry for outage
CrowdStrike, the cybersecurity firm that crashed millions of computers with a botched update all over the world last week, is offering its partners a $10 Uber Eats gift card as an apology, according to several people who say they received the gift card, as well as a source who also received one. On Tuesday, a source told TechCrunch that they received an email from CrowdStrike offering them the gift card because the company recognizes “the additional work that the July 19 incident has caused.” The email was sent from a CrowdStrike email address in the name of Daniel Bernard, the company’s chief business officer, according to a screenshot of the email seen by TechCrunch.
Engadget
The Morning After: Blue screen of death outage affected around 8.5 million devices
The biggest news stories this morning: X is working on a way to block links in replies, Samsung halts Galaxy Buds 3 Pro shipments amid quality control issues, Samsung Galaxy Z Fold 6 review.
TechCrunch
Microsoft says 8.5M Windows devices were affected by CrowdStrike outage
Around 8.5 million devices — less than 1 percent of Windows machines globally — were affected by the recent CrowdStrike outage, according to a Microsoft blog post by David Weston, the company’s vice president of enterprise and OS security. Although the number of affected devices was relatively low, the havoc was widespread and global, affecting banks, retailers, brokerage companies, rail networks, and more.
Engadget
CrowdStrike outage aftershocks cause Delta to cancel over 1,000 more flights
The CrowdStrike outage that started late on Thursday is still causing havoc as Delta was forced to scrap and additional 1,250 flights yesterday.
TechCrunch
Faulty CrowdStrike update causes major global IT outage, taking out banks, airlines and businesses globally
Businesses across the world are reporting IT outages, including Windows "blue screen of death" errors on their computers, in what has already become one of the most widespread IT disruptions in recent years. The outage — linked to a software update from popular cybersecurity firm CrowdStrike — has affected computers running Microsoft Windows at organizations across various sectors, including airlines, banks, retailers, brokerage houses, media companies and railway networks. CrowdStrike's chief executive, George Kurtz, confirmed in a post on X that a "defect" in a content update for Windows hosts had caused the outage, and Kurtz ruled out a cyberattack.
Yahoo News
Microsoft outage: CrowdStrike security update impacts airports, hospitals, banks around the world
A massive technology outage has impacted computer systems around the world. Here's what we know.
TechCrunch
What we know about CrowdStrike's update fail that's causing global outages and travel chaos
A faulty software update issued by security giant CrowdStrike has resulted in a massive overnight outage that’s affected Windows computers around the world, disrupting businesses, airports, train stations, banks, broadcasters and the healthcare sector. CrowdStrike said the outage was not caused by a cyberattack, but was the result of a "defect" in a software update for its flagship security product, Falcon Sensor. The defect caused any Windows computers that Falcon is installed on to crash without fully loading.
Yahoo Finance
Explainer: How CrowdStrike knocked the world offline
How a software update crashed computers around the globe.
TechCrunch
The CrowdStrike outage is a plot point in a rom-com
Around the world, thousands of people like Nicole have had a wrench thrown into their plans due to this outage, which knocked out countless computers running Microsoft Windows. “We were a little bit nervous about [transporting the engagement ring] anyway, but then this just added a whole layer of complexity to it,” Nicole told TechCrunch. Now, the Delta flight she planned to take with her husband early Friday morning has been delayed until 3 p.m. ET at least.
The Yodel
New details in security failures at Trump rally, Delta faces probe over cancellations and Netflix announces new 'Bridgerton' lead
Get caught up on this morning’s news: The federal investigation into Delta, a new ‘Bridgerton’ lead and more in today’s edition of The Yodel newsletter
Autoblog
2025 Porsche Taycan makes big range gains
Porsche posted new EPA range estimates for the Taycan lineup, and the EV has made some major gains.
Yahoo Life Shopping
10 best yard games that are super fun to play for both adults and kids
Test your coordination and aim, indulge in your competitive side or just have some fun and laughs with these top picks.
Engadget
Meta takes down 63,000 Instagram accounts linked to extortion scams
Meta has taken down tens of thousands of Instagram accounts from Nigeria as part of a massive crackdown on sextortion scams.
TechCrunch
Apple Maps launches on the web to challenge Google Maps
Apple announced on Wednesday that Apple Maps is now available on the web via a public beta, which means you can now access the service directly from your browser. The launch puts Apple Maps in direct competition with Google Maps, which has long been available on the web. Apple plans to bring support for additional languages, browsers and platforms in the future.
Yahoo Sports
The Pirates are wasting Paul Skenes, Yankees & Mets face big challenges + listener mailbag questions
Jake Mintz & Jordan Shusterman wonder if the Pirates are wasting a prime winning opportunity with Paul Skenes, the challenges the Yankees and Mets both face in trying to improve their rosters and answer some listener emails.
TechCrunch
Bing previews its answer to Google's AI Overviews
Microsoft this afternoon previewed its answer to Google's AI-powered search experiences: Bing generative search. Available only for a "small percentage" of users at the moment, Bing generative search, underpinned by a combo of large and small generative AI models (mum's the word on which models exactly), aggregates info from around the web and generates a summary in response to search queries. For example, if a user searches "What is a spaghetti western?," Bing generative search will show information about the film subgenre's history and origin and top examples, along with links and sources that show where those details came from.
Yahoo Life Shopping
Women in their 50s and 60s love this anti-aging tool for banishing wrinkles
'Stunned at the results': Fight the frown lines for less than the price of one spa treatment with the on-page coupon.
Engadget
Modders made a tiny Nintendo Wii that doubles as a keychain
Modders made a functional Nintendo Wii that’s the size of a keychain. It doesn’t accept discs, for obvious reasons, but does work with standard controllers.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Most Popular

How to prevent your software update from being the next CrowdStrike

How to prevent bugs from slipping through

Recommended Stories

CrowdStrike blames bug that caused worldwide outage on faulty testing software

CrowdStrike offered a $10 Uber Eats card to teammates and partners, but it got flagged for fraud

CrowdStrike offers a $10 apology gift card to say sorry for outage

The Morning After: Blue screen of death outage affected around 8.5 million devices

Microsoft says 8.5M Windows devices were affected by CrowdStrike outage

CrowdStrike outage aftershocks cause Delta to cancel over 1,000 more flights

Faulty CrowdStrike update causes major global IT outage, taking out banks, airlines and businesses globally

Microsoft outage: CrowdStrike security update impacts airports, hospitals, banks around the world

What we know about CrowdStrike's update fail that's causing global outages and travel chaos

Explainer: How CrowdStrike knocked the world offline

The CrowdStrike outage is a plot point in a rom-com

New details in security failures at Trump rally, Delta faces probe over cancellations and Netflix announces new 'Bridgerton' lead

2025 Porsche Taycan makes big range gains

10 best yard games that are super fun to play for both adults and kids

Meta takes down 63,000 Instagram accounts linked to extortion scams

Apple Maps launches on the web to challenge Google Maps

The Pirates are wasting Paul Skenes, Yankees & Mets face big challenges + listener mailbag questions

Bing previews its answer to Google's AI Overviews

Women in their 50s and 60s love this anti-aging tool for banishing wrinkles

Modders made a tiny Nintendo Wii that doubles as a keychain