Database Outage

Incident Report for Retreaver

Resolved

We have identified the root cause and deployed a fix. A sudden surge of thousands of calls at 14:10 EST revealed an engineering deficiency in call handling which caused our database to lock. This deficiency has been patched, and the fix has been deployed. We are currently conducting a thorough code review to ensure that this deficiency does not exist in other parts of the codebase.

Posted Feb 24, 2016 - 16:50 EST

Update

At approximately 14:10 EST we were alerted to a surge in CPU usage on our primary database server. Unable to locate the cause, we manually failed over to our backup server at 14:16 EST. This action succeeded and operations returned to normal 4 minutes later. We're currently working to identify the root cause and will update this incident in the next 2 hours.

Posted Feb 24, 2016 - 15:06 EST

Investigating

We're investigating an outage in our primary database server. We will provide updates shortly.

Posted Feb 24, 2016 - 14:27 EST