The site reliability engineer role has been around for over 15 years now. And as the ubiquitousness of distributed systems increases, the demand for this role will continue to increase. However, many companies and technologists have not had exposure to the tenets of the SRE role, and there is often a lot of misunderstanding as to what this role is. Unlike traditional operations roles, the site reliability engineer puts additional focus on reducing human intervention by designing and implementing automation.
This position takes components from both operations and software engineering to automate, monitor, troubleshoot, and improve systems. More specifically, the site reliability engineer works on the following aspects of your applications and services: Availability, Latency, Performance, Efficiency, Change Management, Monitoring, Emergency Response, and Capacity Planning.
This three-day course will walk through the book Site Reliability Engineering: How Google Runs Production Systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. During the course, you will learn about Google’s approach to service management, gain an understanding of the basics of site reliability engineering, and get an introduction to advanced topics.
You’ll look at real-world examples and code samples of how companies are using SRE to ensure that their services are exactly as reliable as they need to be. And finally, we’ll cover the culture and human aspects of site reliability that drives successful implementation.
In this Site Reliability Engineering Training Course, You will:
This course is also available publicly via Live Virtual Classroom:
Contact us here.