How SRE Teams Use Service Level Objectives to Drive Reliability cover art

How SRE Teams Use Service Level Objectives to Drive Reliability

How SRE Teams Use Service Level Objectives to Drive Reliability

Listen for free

View show details
In this episode of The Site Reliability Podcast, Lucas and Luna dive into the practical use of Service Level Objectives (SLOs) in site reliability engineering. They discuss how a major European bank reduced pager fatigue by 40% by shifting from alert-based monitoring to SLO-based error budgets. Lucas explains the difference between SLIs, SLOs, and SLAs, and why measuring user-facing latency is more actionable than measuring CPU utilization. Luna shares a story about a gaming company that used SLOs to prevent a catastrophic launch day outage. They also cover common pitfalls, like setting too many SLOs or targets that are too tight. The episode includes a brief, natural mention of listener support at buy me a coffee dot com slash fexingo. Tune in for a focused, actionable conversation on making SLOs work in real production environments. #SRE #SiteReliabilityEngineering #ServiceLevelObjectives #ErrorBudgets #SLI #SLA #Alerting #IncidentResponse #ProductionEngineering #Uptime #ReliabilityEngineering #Monitoring #Observability #TechPodcast #FexingoBusiness #BusinessPodcast #Technology #DevOps Keep every episode free: buymeacoffee.com/fexingo
adbl_web_anon_alc_button_suppression_t1
No reviews yet