Also referred to as Workbooks in Azure Monitor, interactive reports can be quickly built to provide critical information to first responders as they begin to investigate and take early remediation efforts. Guides can include data from sources such as logs, metrics, and more through visualizations like charts, grids, graphs, and text. Everything displayed in the troubleshooting guide can be customized on the fly through parameters making it interactive. The name Troubleshooting Guide is specific to the workbooks made available through Azure’s Application Insights.
For the most part, incidents are unique. Therefore the lessons learned will vary from problem to problem. However, it’s helpful to spot trends in response efforts to both identify what is working and what needs improvement. It’s also helpful for engineering teams to have a sense of how frequent problems are arising and how quickly they are addressed and resolved. When tracking incidents using Azure Boards, it’s quite simple to build reports and charts provide a high level snapshot of incident management efforts.
On-call rosters allow teams to identify who is responsible for acknowledging and addressing incidents as they occur. They are made up of the names and contact information of everyone expected to take part in the response and remediation of service disruptions. On-call Roster Name Email Service On-call Jason Hand firstname.lastname@example.org API Yes Chris Smith email@example.com API No Lauren Jones lauren@xyz.
Now that we have a tool to track the incident details, we need to ensure we are tracking all of the important aspects. Such as When did we know about the problem and more. Let’s now take a look at how we can customize Azure Boards to track additional incident details. When did we know? When a new record (or incident) is created in Azure Boards we will automatically have the date and time as well as a change log throughout the incident’s lifecycle…
What Is This? The systems we work in eventually have problems. They are built, maintained, and supported by technolgists such as yourself. And when an issue inevitably occurs, someone needs to take action to restore services. Responding to those problems helps maintain functionality and operational abilities of an organization’s IT services, serving both internal and external users. Many organizations don’t currently have an incident response plan in place. In fact, efforts to recover from service disruptions rarely follow any kind of repeatable and measured framework at all.
You now know what a post-incident review is, its role in the incident response process, and when it should be conducted. In this unit, you’ll dive a little deeper into the details of what makes a post-incident review most effective. Because incidents differ, the exact makeup of post-incident reviews can be different, too. But there are some common characteristics and components of a good review that can provide you with a solid foundation for carrying out the process.
Tech Used The brains behind this solution is an Azure Function (running Node.js) that is triggered via outgoing webhook (from Microsoft Teams). The function modifies an index.html file stored in a “web server” served from a serverless SMB file share in Azure Storage. Users can open, update, and close “status updates” by invoking them from within a chat channel. The text that follows the command will be stored and displayed on the site below the colored (Red or Green) header.
Tracking incidents is as easy as setting up a datastore, like the table storage used for the on-call roster. However, why reinvent the wheel? Why not use something already available, customizable, extensible, and free? Azure Boards is my tool of choice in this tutorial, but honestly this could probably be done with any popular project management tool with an API. First, login or create a free Azure Devops account. 1.