Answers To Your Questions

What Is This?

The systems we work in eventually have problems.

They are built, maintained, and supported by technolgists such as yourself. And when an issue inevitably occurs, someone needs to take action to restore services.

Responding to those problems helps maintain functionality and operational abilities of an organization’s IT services, serving both internal and external users.

Many organizations don’t currently have an incident response plan in place. In fact, efforts to recover from service disruptions rarely follow any kind of repeatable and measured framework at all. Engineers react rather than respond.

With the increased reliance on digital services and their underlying technology it’s more important than ever to establish an explicit response plan. There are small steps that you could take immediately so that when the next problem occurs, everyone knows what to do. The incident itself can be viewed not just as an outage but an opportunity to learn.

On-call Life is dedicated to providing foundational concepts and information related to being on-call including monitoring, incident response, and the post-incident review process.

This is a live site with new information added regularly. Much of the content is syndicated from presentations created for and delivered during Microsoft’s Ignite the Tour.

Throughout these articles, demonstrations and resources specific to Azure will be used, but the foundations of monitoring, incident response, and retrospectives are agnostic to tooling. Demonstrations on Azure is done to illustrate rather than to suggest “best practice” implementations.

How Do I Use This?

Begin, by examining why the responsibilities of on-call have become so critical to nearly every business, group, and government.

Monitoring For Reliability

Who Is Behind This?

Jason Hand - @jasonhand

Edit this page

Jason Hand
Azure Cloud Advocate

Jason Hand is an author and speaker on the subjects of site reliability engineering, incident management, post-incident analysis, and chatops. Co-organizer and supporter of several tech communities including DevOpsDays Rockies, Jason enjoys connecting story tellers and actionable ideas with those who are hungry to learn. Jason also loves to bring together ideas and expertise around building communities in tech through his podcast “Community Pulse”.