What I did to get out of the first wall I was assigned to the SRE team and bumped into, "What should I start with?"

This article is the 20th day of Motivation Cloud Series Advent Calendar 2020 .

Introduction

After joining the engineer organization for about a year, I was involved in the development of web applications, and from September this year I joined an inexperienced SRE team.

Because the SRE team has a wide scope to deal with and a wide range of tasks to deal with, Immediately after joining the SRE team, there was a time when I was confused because I didn't know the priority of the task at all, "What should I start with?"

In the book "Site Reliability Engineering", "What SRE should do" is "Service availability", "Latency", "Performance", "Efficiency", "Change management", "Monitoring", "Emergency response", and "Capacity planning". It is written that it is a responsibility. In addition to this, we are also responsible for "security risk," "cost," and "development productivity."

item Contents
Security risk Reduce security risks so that customers can use our services with peace of mind.
cost Optimize the cost of system operation. In addition, it will be managed so as not to exceed the set budget.
Development productivity Improve development productivity by eradicating Toil and improving deployment flow.

In this article, we have summarized how our team has devised how to prioritize the various daily operations of SRE as described above. I hope it will be helpful for you.

About organizing tasks

In order to prioritize tasks, we first organized the tasks from the following perspectives.

item Contents Remarks
Target of influence Where are the targets affected? ・ Customer impact
・ Internal influence
Impact What is the extent of influence in your organization? ・ The degree of influence is large
・ The degree of influence is normal
・ Small degree of influence
frequency How often does the phenomenon occur? ・ It's happening right now
・ May occur
・ There is almost no possibility of it happening
cost How long does it take to respond? ・ Within 1 day
・ Within a few days
・ 1 week or more

How to prioritize

For each viewpoint, we calculated the priority score by weighting and decided the priority. I will explain the score and the reason for each.

Target of influence

At our company, we think that the customer is the first, and we think that it is necessary to prioritize the solution of the task that has the customer influence over the task that has the internal influence, and we formulate the score as follows.

item Score
Customer impact 5
Internal impact 3

Impact

There are various factors such as the degree of influence from the field to the development organization, customers, etc., and even if the granularity of each is finely divided, the priority cannot be determined. By making the granularity coarser, it is easier to decide the priority, and the degree of influence is high, so the scores are assigned as follows.

item Score
Great degree of influence 5
Normal degree of influence 3
The degree of influence is small 1

frequency

If you don't get the task done quickly, you may take the time to deal with the problem right now. Therefore, determine the score by considering whether the problem is occurring right now .

item Score
I'm having a problem right now 3
Problems can occur 2
There is almost no possibility of problems 1

cost

Cost refers to the time it takes to complete the task. When deciding on a score, don't think on your own, but combine awareness of how long you can finish within the team. Allocate the score so that the delivery date will not be delayed no matter who you give it to.

item Score
Within 1 day 3
Within a few days 2
1 week or more 1

Logical formula

We have defined the priority score as follows:

Priority score=Category x Impact x Frequency x Cost

By keeping in mind that the priority score is 70 or higher and making a schedule from among them, it became clear what to do. image.png

Outcomes

Until now, there was a priority among each person in the team, and there was some deviation, but the priority of the team has been aligned. As a result, today I'm not wondering what to do in the short timeline of a week, and my performance has improved.

Finally

When you're wondering what to do as an SRE, you can organize and prioritize your daily tasks. It's no longer confusing as it becomes clear what to focus on today.

It may be tempting to look down on doing everything from task organization to prioritization every day, It is important to work consciously.

Recommended Posts

What I did to get out of the first wall I was assigned to the SRE team and bumped into, "What should I start with?"
I want to control the start / stop of servers and databases with Alexa
What I was addicted to with the Redmine REST API
Technical causes and countermeasures for the points I was addicted to with the first Android app & Kotlin
What I did when the DB did not start with docker-compose up
I tried to measure and compare the speed of GraalVM with JMH
What I did when I was addicted to the error "Could not find XXX in any of the sources" when I added a Gem and built it
What I tried when I wanted to get all the fields of a bean
I wanted to start the AP server and debug with just the Maven command
I want to recursively get the superclass and interface of a certain class
I was addicted to the record of the associated model
What should I do to reload the updated Dockerfile?
Notes on errors that occur when installing the JDK and countermeasures that I have tried
This and that of the JDK
What I did when the DB did not start with docker-compose up
Memorandum: What I was addicted to when I hit the accounting freee API
What I did when JSF couldn't display database information in the view
What I did when I was addicted to the error "Could not find XXX in any of the sources" when I added a Gem and built it
In WSL2, when I did `docker-compose up`, I got an error saying that the sh file was not found.
A story that I was really into when I did triple DES with ruby
What I did when I converted java to Kotlin
The story that did not disappear when I tried to delete mysql on ubuntu
What I did to get out of the first wall I was assigned to the SRE team and bumped into, "What should I start with?"
[Ruby] Misunderstanding that I was using the module [Beginner]
Now in the third year, the misunderstanding that I noticed is the difference between the equals method and ==
I can't get out of the Rails dbconsole screen
What should I do after January 2019 regarding the Java payment issue and Java 8 end of support issue?
I want to get a list of the contents of a zip file and its uncompressed size
Technical causes and countermeasures for the points that I was addicted to with the Android app & Kotlin (2. Processing related to the camera function of Android *)