For over a few hours on Monday, several Google services came crashing to a halt. Users all over the world were unable to send messages via Hangouts, engage in video chats, or check Google Voice. Some people trying to create spreadsheets with Sheets were met with 502 errors, and people taking advantage of the multi-player aspect of Google Play Games were also affected. All of this apparently resulted from an oops during a routine hardware maintenance event where the company miscalculated available capacity.

During these maintenance events, Google redirects traffic away from certain backend servers to a new set while they perform their work. Due to this slip-up, the new servers lacked enough capacity to handle the redirected traffic. Google Engineers started running the maintenance procedure at 8:25 AM and realized something was up roughly twenty minutes later.

The team then brought in additional capacity, halted the maintenance process, and started bringing users back online in waves to avoid overwhelming the system.

These things happen, but if you need the reassurance that Google's learned its lesson, here is a dry list of bullet points the company's provided to show what it's taking away from the experience.

You can read the entire incident report for yourself at the link below.

Google Apps Incident Report - March 17, 2014