Date: July 30, 2024
Preliminary investigations revealed that the outage resulted from a cyber-attack and a failure to adequately defend against it. The incident lasted nearly 10 hours, affecting thousands of users worldwide. Interestingly, this outage occurred less than two weeks after another major global outage caused by a flawed software update from cybersecurity firm CrowdStrike. That incident left around 8.5 million computers using Microsoft systems inaccessible, impacting critical sectors like healthcare and travel.
The initial trigger event was as a result of a Distributed Denial-of-Service (DDoS) attack. Unfortunately, an error in the implementation of Microsoft's defenses amplified the impact of the attack rather than mitigating it. This vulnerability surprised experts, given Microsoft's robust network infrastructure¹.
The outage impacted Microsoft Azure (the cloud computing platform behind many services), Microsoft 365 (including systems like Microsoft Office and Outlook), Intune, and Entra. Utilities, Courts and Tribunal Services, and financial institutions worldwide were among those who experienced disruptions due to their reliance on these Microsoft's platforms¹.
Microsoft has apologized for the inconvenience caused by this outage. While the issue has been resolved, it highlights again vulnerabilities of users, who rely on these services daily, and their stability.
Mitigation Statement - 07/30/2024 - Azure Front Door - Issues accessing a subset of Microsoft services
racking ID: KTY1-HW8 https://azure.status.microsoft/en-us/status/history/What happened?
Between approximately at 11:45 UTC and 19:43 UTC on 30 July 2024, a subset of customers may have experienced issues connecting to a subset of Microsoft services globally. Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and a subset of Microsoft 365 and Microsoft Purview services.
What do we know so far?
An unexpected usage spike resulted in Azure Front Door (AFD) and Azure Content Delivery Network (CDN) components performing below acceptable thresholds, leading to intermittent errors, timeout, and latency spikes. While the initial trigger event was a Distributed Denial-of-Service (DDoS) attack, which activated our DDoS protection mechanisms, initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it.
How did we respond?
Customer impact began at 11:45 UTC and we started investigating. Once the nature of the usage spike was understood, we implemented networking configuration changes to support our DDoS protection efforts, and performed failovers to alternate networking paths to provide relief. Our initial network configuration changes successfully mitigated majority of the impact by 14:10 UTC. Some customers reported less than 100% availability, which we began mitigating at around 18:00 UTC. We proceeded with an updated mitigation approach, first rolling this out across regions in Asia Pacific and Europe. After validating that this revised approach successfully eliminated the side effect impacts of the initial mitigation, we rolled it out to regions in the Americas. Failure rates returned to pre-incident levels by 19:43 UTC - after monitoring traffic and services to ensure that the issue was fully mitigated, we declared the incident mitigated at 20:48 UTC. Some downstream services took longer to recover, depending on how they were configured to use AFD and/or CDN.
What happens next?
Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings.
Other References
(1) Microsoft apologises after thousands report new outage. https://www.msn.com/en-us/news/technology/microsoft-apologises-after-thousands-report-new-outage/ar-BB1qTqfw.
(2) Microsoft investigating new outages of services after global CrowdStrike chaos. https://www.yahoo.com/news/microsoft-investigating-outages-services-global-172728482.html.
(3) DDoS Attack Triggers New Microsoft Global Outage. https://www.infosecurity-magazine.com/news/ddos-microsoft-global-outage/.
(4) Microsoft apologises after thousands report new outage - BBC. https://www.bbc.com/news/articles/c903e793w74o.
If you are interested in finding out more about the IISF, or would like to attend one of our Chapter Meetings as an invited guest, please contact the
IISF Secretary:
By email:
secretary@iisf.ie
By post:
David Cahill
Information Security
GPO, 1-117
D01 F5P2
Enhance your Cybersecurity knowledge and learn from those at the coalface of information Security in Ireland
Invitations for Annual Sponsorship of IISF has now reopened.
Sponsors are featured prominently throughout the IISF.IE website, social media channels as well as enjoying other benefits Read more