Root Cause Analysis of Outage for Week of January 19, 2015

The following is a root cause analysis and postmortem for the outage experienced during the week of January 19th, 2015 at Pressable.

Root Cause Analysis. What caused this outage?

Ultimately, the reason for this outage was a well crafted attack on our systems.  The attack was a variant of the “slowloris” attack discovered in 2009.  The attack went undetected for so long due to the method in which it was executed. After discovering this variant of the attack, we have been working with various security professionals and teams to learn more about it, and ways to mitigate it faster.

What was so special?

Two things contributed to the success of this attack, the first was the manner in which the IP address for the two clusters that were configured on our F5. Due to the nature of our needs, the F5 was configured in what’s commonly known as “Bridge Mode”. This means the F5 was not set to inspect any HTTP or HTTPS traffic.

The second thing was how the requests were made. This attack came from multiple IP addresses, and to multiple sites hosted on our systems. Since the requests were made to active domains and paths on the system it looked like legitimate traffic that we were failing to respond to.

All of our monitoring systems and graphs seemed to indicate that we were underpowered, or had reached capacity, but it wasn’t the case.

We will continue to work on our infrastructure and architecture, we still have some big announcements to make on that front, however, we’ll save those for later.

Who was responsible?

We are still investigating who the perpetrators are, and if there is any legal recourse available to us.

What’s Next?

First, we’ve patched all our systems and have made some changes to our F5 so that this attack, or variants of it will be stopped before they do too much damage.

Secondly, it has been made it abundantly clear that our communication channels and processes are not as effective or transparent as they could be. Many of our customers simply wanted to know that we were aware of the situation and were working to fix it.  To allay concerns, we opened up a chatroom where we were giving real time updates.  We found this room was welcomed by a lot of you, so we’ve decided to keep this room open all the time.  For the security and privacy of our customers, we’ll be making some changes, and then we’ll be inviting all of you to join us next week.

Our Sincerest Apologies

We’d like to extend our sincerest apologies to everyone affected by this outage. Our entire team worked tirelessly through the last couple of weeks to handle the situation as best we could given the extreme circumstances.

We would like to thank all of the customers and WordPress community members that reached out to us and showed their support for our team. Your encouragement and kindness was truly appreciated and helped remind us of the kindness in our community.

We’re Sorry for the Downtime.

Beginning on January 9, 2015 Pressable experienced the worst outage in the history of the company. Below is an email from Vid Luther (Founder and CEO) sent to all customers impacted by the issue. On behalf of the entire team at Pressable, I would like to say we’re sorry. We’re sorry about the outages you have experienced recently, including the longest ever outage in company history, this weekend. I’m also sorry for not responding  to some of your phone calls

Read More

Using WP-CLI for Spam Comment and Revision Management

Advanced WP-CLI Commands In a previous article we covered some of the basics of managing a WordPress site using WP-CLI. Today we’re jumping in with a couple of advanced wp-cli commands that can really help when working with large sites that are overrun with spam and post revisions. Spam comments and post revisions are two things that can easily bloat your WordPress database. They cause maintenance problems as database sizes have grown unnecessarily large, performance issues as data access is slowed down and

Read More

DNS Management: Record Types and When To Use Them

This article is part three in a series dedicated to the DNS Management; DNS Record Types Explained. It is recommended that you read the first two entries in this series, “Registrars and Nameservers,” and “Zones, Record Types, and Record Composition” before continuing with this article. In this part, we will discuss a few of the more commonly used record types and when you might use them. A Records A Records are the most basic type of DNS record and are used to point a domain

Read More

DNS Management: Zones, Record Types, and Record Composition

This article is part two in a series dedicated to the basics of DNS management, DNS Record Types. It is recommended that you read the first entry in this series, “Registrars and Nameservers,” before continuing with this article. Zones and Records Before we dive into various types of DNS records, it is important to understand the distinction and concept DNS Zones and DNS Records. A DNS Record is a single entry that gives the zone instructions on how to handle any given

Read More