The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering cover art

The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering

The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering

By: Fexingo
Listen for free

Lucas and Luna cut through the noise around site reliability engineering to examine how real-world SRE teams balance uptime, incident response, and production change. Each episode takes a single concept — error budgets, toil automation, postmortem culture, capacity planning — and grounds it in a specific case: how a major streaming service reduced paging noise, how a payments platform rebuilt its incident command structure, or how a cloud provider manages multi-region failover. Lucas brings the numbers — latency percentiles, MTTR trends, SLO burn rates — while Luna pushes on the human and organizational trade-offs: What does a junior SRE need to know about on-call? How do you measure reliability without crushing innovation? Why do some blameless postmortems actually work? Together they treat SRE not as a certification topic but as a living practice, citing real outages, open-source tools, and engineering blogs. This show is for engineers, ops leads, and platform teams who already know the basics and want to debate the hard edges: Is 99.999% uptime always worth the cost? When should you deliberately degrade service to improve reliability? How do you design for resilience when your system is already in production? Lucas and Luna don't pretend to have final answers — they build the conversation so you can draw your own. If you've ever argued about whether a page was necessary or whether an SLO should be tightened, this is your show. #SiteReliabilityEngineering #SRE #Uptime #ProductionEngineering #IncidentResponse #ErrorBudgets #SLOs #Postmortem #ToilAutomation #CapacityPlanning #Observability #DevOps #PlatformEngineering #Resilience #OnCall #FexingoBusiness #BusinessPodcast #Technology Keep every episode free: buymeacoffee.com/fexingo© 2026 Fexingo. All rights reserved. Economics
Episodes
  • How SRE Teams Use Canary Deployments to Reduce Risk
    Jun 27 2026
    Episode 77 of The Site Reliability Podcast dives into canary deployments: rolling out code changes gradually to a small subset of users before a full release. Lucas and Luna explain how companies like Netflix and Etsy use canary analysis to catch regressions early, using real traffic and metrics. They walk through the mechanics: routing a fraction of traffic, comparing key SLOs like latency and error rates, and the decision to roll forward or roll back. The hosts discuss the difference between canary and blue-green deployments, how to choose the right canary size, and what happens when a canary fails. They also cover the human side: developer anxiety during canary windows, the importance of automated rollback triggers, and how mature SRE teams integrate canary results into their deployment pipeline. By the end, listeners will understand why canary releases are a cornerstone of safe, high-velocity deployment. #CanaryDeployments #SRE #SiteReliability #ProductionEngineering #IncidentResponse #Uptime #DeploymentStrategies #Netflix #Etsy #Spinnaker #ContinuousDelivery #DevOps #Automation #Rollback #Latency #ErrorBudget #FexingoBusiness #Technology Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    11 mins
  • How SRE Teams Use DORA Metrics to Measure DevOps Performance
    Jun 27 2026
    In this episode of The Site Reliability Podcast, Lucas and Luna dive into DORA metrics — the four key DevOps Research and Assessment measures that elite SRE teams use to quantify software delivery and operational performance. They break down each metric: deployment frequency, lead time for changes, mean time to restore (MTTR), and change failure rate. The hosts explain how Google's 2019 Accelerate State of DevOps report found that elite performers deploy 208 times more frequently than low performers, with lead times 106 times faster. Lucas and Luna discuss why measuring these metrics matters for SRE teams, common pitfalls like vanity metrics and local maxima, and practical steps to start tracking DORA without overhead. They also touch on how DORA complements the more complex SPACE framework from GitHub for developer productivity. Real examples from cases like Etsy's continuous deployment and Netflix's Simian Army illustrate the concepts. The conversation is grounded in the current tech environment as of June 27, 2026, where platform engineering teams are increasingly adopting DORA dashboards to drive reliability improvements. #DORA #DevOps #SRE #SiteReliabilityEngineering #GoogleCloud #AccelerateBook #DeploymentFrequency #LeadTime #MTTR #ChangeFailureRate #DevOpsMetrics #SoftwareDelivery #ContinuousDeployment #SPACEFramework #Etsy #Netflix #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    10 mins
  • How SRE Teams Use Service Level Objectives to Drive Reliability
    Jun 26 2026
    In this episode of The Site Reliability Podcast, Lucas and Luna dive into the practical use of Service Level Objectives (SLOs) in site reliability engineering. They discuss how a major European bank reduced pager fatigue by 40% by shifting from alert-based monitoring to SLO-based error budgets. Lucas explains the difference between SLIs, SLOs, and SLAs, and why measuring user-facing latency is more actionable than measuring CPU utilization. Luna shares a story about a gaming company that used SLOs to prevent a catastrophic launch day outage. They also cover common pitfalls, like setting too many SLOs or targets that are too tight. The episode includes a brief, natural mention of listener support at buy me a coffee dot com slash fexingo. Tune in for a focused, actionable conversation on making SLOs work in real production environments. #SRE #SiteReliabilityEngineering #ServiceLevelObjectives #ErrorBudgets #SLI #SLA #Alerting #IncidentResponse #ProductionEngineering #Uptime #ReliabilityEngineering #Monitoring #Observability #TechPodcast #FexingoBusiness #BusinessPodcast #Technology #DevOps Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    11 mins
adbl_web_anon_alc_button_suppression_t1
No reviews yet