CrowdStrike, Cloud Strike?

CrowdStrike, Cloud Strike?

On Friday, July 19th a section of the world experienced an IT meltdown. Businesses that majorly depended on Microsoft Windows machines experienced what in the local parlance is known as "Blue Screen of Death" or BSOD. This is when your Windows machine crashes and you have a blue screen starring out you from your hapless monitor. I don't know if Mac systems behave the same but it is a regular occurrence at the workplace.

So what happened on Friday that affected 8.5 million Microsoft Windows machines hitting airports, hotels and hospitals among other services in most part of the Western world? And what can we learn from it in order not to have a repetition? This is what this article plans to address.

So what actually happened on the 19th? According to the BBC, "A single update pushed out from an anti-virus company in the US has managed to cause global havoc today". The anti-virus company here is known as CrowdStrike and collaborates with Microsoft to provide cybersecurity defense in connected Windows machines. We have come to understand that updates are regularly pushed to these machines but this latest unfortunately turned out to be a rogue. In its defense, Microsoft quickly pointed affected users in the direction of a third party provider but would this be enough to assuage their grief?

Where did CrowdStrike drop the ball?

"A single update pushed out"! It is amusing and also disappointing to realise that in trying to prevent the arsonist from getting into the barn, CrowdStrike actually threw a flare into the barn. Yes Microsoft was quick to distance itself from the disaster but I think that they should learn from this and put in preventive measures against next time. It could be more impactful!

Without beating about the bush, this problem is a classic case of not testing the harmful update thoroughly in development before pushing out to production. The Devops team would need to throw more light on this. Product testing is a very detailed process in software development and it is expected that the slightest flaw in the product is identified in the pipeline and fixed before it gets to market. I'm not sure if this was followed through on the 19th. Microsoft in retrospect should have a system in place to also check updates from third parties providers. Windows users know Microsoft and hardly CrowdStrike. I have been in IT support for years now and just got to know them on the 19th. CrowdStrike's CEO George Kurtz has been invited to testify in the US Congress. I look forward to learning what explanation he will provide on this matter. I also think that Microsoft should be invited to give their own side of the story. They are the vehicle. Aren't they?

Last Words

In concluding, I have learnt a lot from this July 19th incident.

First, a system remains as strong as its weakest link. "A single update" was all that was needed to bring down the operations of major businesses around the world in a matter of minutes. CIOs should pay serious attention to this fact and take measures to ensure that all moving parts of a machines are well oiled. Yes it was a mistake. It could be malicious next time.

Secondly, IT operations should not be in a hurry to embrace fancy new technologies at the detriment of functional tested ones. Today the buzzword is AI. The people running the existing systems are still very important. How is their state of mind? They will still drive the processes. Continue to invest in them while you are tinkering with the next big thing.

Thirdly, the BBC reported that most Chinese systems were not affected by the hitch. This was so because overtime, Beijing has encouraged the development of local cloud platforms like Alibaba Cloud and Tencent. Chinese businesses that rely on US and European based platforms like western hotels of the likes of Hilton were though impacted but overall, most networks were spared. This could have helped saved the connected world a lot of headache had everywhere being affected.

No matter what impact that was experienced, it is a good thing that this was just an error and I hope that the world has learnt a useful lesson from it.

Cheers and thanks for stopping by.