Service disruption at the London (UK) data center

Incident Report for Kinsta

Postmortem

The DDoS attack between August 11 and August 18, 2020 was the most long-lasting and sophisticated our company has endured. In this post, we will recap the incident and explain the steps we're taking in response. However, we want to start by apologizing for the disruption this event has caused for many of our customers. When you picked Kinsta as your hosting provider, your decision was an indication of the trust you were placing in us. We don't take that trust for granted, and we understand that this incident may have shaken your trust in us.

Recap of the DDoS Attack

Between August 11 and August 18, 2020, we experienced an intermittent and evolving DDoS attack directed against our infrastructure in the London (UK) data center.

The initial attack was a standard DDoS attack which attempted to overwhelm our infrastructure with cache-bypassing requests. We were able to mitigate that activity comparatively quickly with minimal impact to our customers.

On August 13, the attack shifted to an evolving series of sophisticated strategies. Repeatedly, our Engineering team analyzed the attack and deployed mitigating measures. However, it became apparent that the attacker was monitoring the effectiveness of their efforts, and as we mitigated one aspect of the attack, the attack strategy would change.

As time went by, our mitigation efforts were increasingly successful. Toward the conclusion of this incident our mitigation efforts were able to largely mute the impact of the DDoS attack. For example, between August 17 and 18, we detected many hours of DDoS activity. However, due to our mitigation efforts, during approximately 75% of that time our customers were unaffected by the attack. In addition, the DDoS activity resumed on August 24, and our Engineering team was able to mitigate that attack without significant impact to customer sites.

What We're Doing in Response

This incident has highlighted the need for two significant changes.

The first change is the creation of a specialized team with responsibility for security within our Engineering team. In the past, security has been the responsibility of our entire Engineering team. While our entire Engineering team will continue to have responsibility for monitoring and responding to security events, we've also made the decision to dedicate multiple full-time specialized personnel to that effort. This will result in deeper expertise, clearer assignment of responsibility, and a more effective response the next time we face a similar incident.

The second change we're making is the development of a Crisis Communication Workflow. During the course of this incident it became clear to us that we were not communicating frequently enough or providing sufficient information to our customers. Our intentions in this regard were good. In the past, when dealing with short-lived incidents, providing details after an event had been fully resolved has allowed our team to focus on solving the problem. When our team is fully-focused on solving the problem, the problem gets solved faster. However, what became clear to us is that this approach is not appropriate for long-lasting incidents, such as this DDoS event. During long-lasting events, it's critical that you, our customers, hear from us much more regularly and that you receive more detailed information. Toward that end, we have created a new Crisis Communication Workflow that will get information out to our customers much faster during long-lasting incidents in the future.

We Will Work to Regain Your Trust

Kinsta exists to serve our customers. We've been blown away by the patience, kindness, and understanding our customers have shown us during this incident. We are committed to learning and growing as a team with the aim of living up to the trust you continue to place in us.

Posted Aug 26, 2020 - 15:56 UTC

Resolved

Our monitoring systems have not detected any malicious traffic during the past 48 hours. At this time, we are marking this incident as resolved. Our Engineering team will continue to monitor activity and remain prepared to respond promptly in the event that the issue returns.

Posted Aug 20, 2020 - 18:25 UTC

Update

We are still closely monitoring activity in the London data center. However, our monitoring systems have not detected any malicious activity since our last update.

Posted Aug 20, 2020 - 12:14 UTC

Update

We continue to monitor activity in the London data center closely. Our monitoring systems have not detected any malicious activity since our last update.

Posted Aug 20, 2020 - 02:15 UTC

Update

We are continuing to closely monitor activity in our London data center. While our monitoring systems are not detecting malicious traffic at this time we will continue to closely monitor the situation and continue to provide additional updates.

Posted Aug 19, 2020 - 17:58 UTC

Update

The systems are remaining stable. Our Engineering team continues to closely monitor activity in the London data center. Our mitigation efforts continue to be effective at this time.

Posted Aug 19, 2020 - 09:55 UTC

Update

Our Engineering team continues to closely monitor activity in the London data center. Our prior mitigation efforts continue to be effective at this time.

Posted Aug 19, 2020 - 02:00 UTC

Update

Our London data center continues to experience an ongoing sophisticated DDoS attack.

This attack started more than a week ago as a straightforward DDoS attack targeting a single site. We see these types of attacks routinely and we easily mitigated the original attack.

However, the attacker changed tracks about a week ago. The attacker shifted from attacking one site to attacking our infrastructure directly. In addition, the attacker shifted from a simple DDoS attack to a series of sophisticated attacks that evolve in response to our mitigation efforts.

Fully resolving this incident is the top priority at Kinsta, across all departments. Our entire Engineering team as well as key members of our Executive and Development teams are completely focused on this issue.

As of the time of the writing of this note, the attack is ongoing but we have taken steps to mitigate the impact of the attack. We do not consider this incident resolved and we're focused on identifying and implementing additional mitigation techniques.

We recognize the seriousness of this incident and the way it may have shaken your trust in us. We are committed to fully resolving this incident as well as learning and growing as a company by having gone through it. We know this has been an incredibly trying time for our customers and we appreciate the patience, support, and understanding our customers have shown us. It's an honor to continue to serve you.

Posted Aug 18, 2020 - 17:51 UTC

Monitoring

Engineers have been monitoring the London data center and services have remained stable. We will continue to monitor the situation.

Posted Aug 18, 2020 - 13:25 UTC

Investigating

Our London Data Center is facing a renewed DDoS attack. Our Engineers are actively working to mitigate the attack.

Posted Aug 18, 2020 - 11:05 UTC

Update

Engineers have been monitoring the London data center and services have remained stable. We will continue to monitor the situation.

Posted Aug 17, 2020 - 19:29 UTC

Update

The target of the DDoS attack has been successfully transferred to a separate/dedicated LoadBalancer away from our standard LoadBalancer infrastructure. We're continuing to monitor the situation closely.

Posted Aug 17, 2020 - 15:37 UTC

Update

We have been closely monitoring this situation and service has remained stable. We will continue monitoring and update again if the situation changes.

Posted Aug 17, 2020 - 15:33 UTC

Monitoring

Engineers have been monitoring the London data center and services have remained stable since the last update. We will continue to monitor the situation.

Posted Aug 17, 2020 - 14:16 UTC

Update

At this time services are responding as normal though the attack is ongoing. Our Engineers continue to work on mitigating the attack.

Posted Aug 17, 2020 - 13:41 UTC

Identified

Our London Data Center is facing a renewed DDoS attack. Our Engineers are actively working to mitigate the attack.

Posted Aug 17, 2020 - 12:29 UTC

Update

Engineers have been monitoring the London data center and services have remained stable since the last update. We will continue to monitor the situation.

Posted Aug 16, 2020 - 16:46 UTC

Monitoring

Engineers have mitigated the attack, and we're continuing to monitor.

Posted Aug 16, 2020 - 00:46 UTC

Identified

Our London Data Center is facing a renewed DDoS attack. Our Engineers are actively working to mitigate the attack.

Posted Aug 15, 2020 - 23:39 UTC

Monitoring

Our engineers believe they have mitigated the recurrence of the attack but are continuing to monitor the situation closely.

Posted Aug 15, 2020 - 13:53 UTC

Identified

Our London Data Center is facing a renewed DDoS attack. Our Engineers are working to mitigate the attack.

Posted Aug 15, 2020 - 12:51 UTC

Update

We have been closely monitoring this situation and service has remained stable. We will continue to monitor throughout the weekend.

Posted Aug 14, 2020 - 21:42 UTC

Monitoring

Our Engineers have been in discussions with Google Engineers, and we believe the attack to be mitigated at this time. We are watching the situation closely before we consider this resolved.

Posted Aug 14, 2020 - 17:33 UTC

Update

Systems Engineers continue to work diligently to mitigate the ongoing attack. Further updates will be provided as new information becomes available

Posted Aug 14, 2020 - 15:29 UTC

Identified

Thank you for your continued patience.

Our London data center has been the target of a DDOS attack, and our systems engineers have been working diligently to mitigate the attack in conjunction with Google Cloud Platform engineers.

We continue working towards complete mitigation, and additional updates will be provided as they become available.

Posted Aug 14, 2020 - 13:32 UTC

Monitoring

Our system engineers have applied a solution for the problem. We are working on restoring all services fully. The situation is being monitored further.

Posted Aug 14, 2020 - 12:57 UTC

Identified

Our engineers are continuing to work on stabilizing services as quickly as possible.

Posted Aug 14, 2020 - 10:47 UTC

Investigating

Our Engineers are investigating reports of a service disruption affecting sites in the London (UK) data center. We are working to restore service to affected sites as quickly as possible.

Posted Aug 14, 2020 - 10:29 UTC

This incident affected: London, United Kingdom data centers.