Understanding Service Level Indicators

Site Reliability Engineering uses SLIs and SROs to measure the aspects of reliability that you learned about in Unit 2: availability, latency, throughput, coverage, correctness, fidelity, freshness, and durability, and whether you are meeting expectations in each applicable area. What to measure The first question to ask in relation to the aspect you want to measure is what to measure. Example #1: Measure availability How would you determine the availability of a web server?

Understanding Service Level Objectives

Now you know how to measure reliability using SLIs, but the ratios and percentages that you’ve calculated only get you halfway toward fulfilling the goal of site reliability engineering. You can now say the web server in our example is 50% reliable, but is that the appropriate level of reliability as discussed in our definition of SRE? It’s also useful to know the period of time to which that reliability level applied.

Understanding the Measurements of Reliability

Now that we have a better idea on the different things that we could be measuring, let’s talk about what to do with the data. Operational practices to create reliability feedback loops known as service level indicators and objectives help transform the reliability discussion. Service Level Indicators are the measurements that you use to determine whether you have reached those goals; in other words, the indicators that your service is behaving reliably.