Alexa's Input (AI) cover art

Alexa's Input (AI)

Alexa's Input (AI)

By: Alexa Griffith
Listen for free

Alexa’s Input is a podcast about how technology actually moves forward. Hosted by Alexa Griffith, it features conversations with engineers, founders, CEOs, and leaders shaping today’s tech landscape. Each episode digs into the decisions behind the systems — what’s being built, what’s being questioned, and why it matters now. Opinions are my own Linktree: https://linktr.ee/alexagriffith Website: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/ X: @lexal0uAlexa Griffith
Episodes
  • Systems, Scale, and SRE with Vlad Leyberov
    Jun 29 2026

    Most engineers think reliability means avoiding outages. Vlad Leyberov learned the opposite lesson: sometimes you have to intentionally cause a 100% outage to fix the system faster.

    Vlad is a Site Reliability Engineer (SRAE) at Google, running systems that handle billions of requests per second. Before Google, he kept critical infrastructure running at Meta (billions of events a day) and Amazon (millions of Alexa devices).

    In this conversation, we dig into cascading failures, incident responses, why consistency beats speed, how AI changes reliability engineering, and the philosophy behind running systems where downtime doesn't feel like an option.

    Topics Discussed:

    • How cascading failures propagate unpredictably in distributed systems (like nature, not machines)
    • Incident responses: virtual panic rooms, on-call, paging procedures, and how to narrow down failure points
    • The Alexa incident: why dropping an entire DynamoDB table was the right call
    • Critical User Journeys (CUJ): measuring end-to-end customer experience vs individual SLOs
    • Career journey from the USSR to maritime academy to business degree in Australia to SRE at Amazon, Meta, and Google
    • Why consistency in API response times beats raw speed
    • How AI makes it dangerously easy to create complex systems with poorly understood interactions
    • Science fiction, the Borg as a distributed system, and the Three Body Problem trilogy
    • Hot takes on reliability: all software development is maintenance, overrated 9s, underrated global failure modes


    General Podcast Links

    Watch: https://www.youtube.com/@alexa_griffith

    Read: https://alexasinput.substack.com/

    Listen: https://creators.spotify.com/pod/profile/alexagriffith/ More: https://linktr.ee/alexagriffith


    Learn more about the host

    Website: https://alexagriffith.com/

    LinkedIn: https://www.linkedin.com/in/alexa-griffith/


    Find out more about Vlad Leyberov

    LinkedIn: https://www.linkedin.com/in/vladleyberov/ Google SRE NYC Tech Talks

    Resources

    Google SRE Resources:

    • Google SRE Book: https://sre.google/books/
    • Google Cloud Platform: https://cloud.google.com/
    • Google Cloud Build: https://cloud.google.com/build (service discussed in outage story)
    • Google Cloud Pub/Sub: https://cloud.google.com/pubsub (Vlad's previous role, billions of requests/second)

    Sci-Fi Books Mentioned:

    • Three Body Problem trilogy by Liu Cixin (Vlad's current favorite)
    • Foundation series by Isaac Asimov
    • Left Hand of Darkness by Ursula K. Le Guin
    • Snow Crash by Neal Stephenson

    Internal Google Systems Referenced:

    • Borg: Google's internal cluster management system (Kubernetes predecessor), named after Star Trek Borg
    • DynamoDB: AWS distributed key-value store (used in Alexa poison pill incident)


    Intro Music:PR1BVOV7R4F1ASZC

    Show More Show Less
    58 mins
  • David Aronchick on Distributed Data Orchestration with Expanso
    Jun 15 2026
    In this episode of Alexa's Input (AI), I sit down with David Aronchick, co-founder and CEO of Expanso and former product lead for Kubernetes at Google.Data is growing everywhere outside your data center. Solar panels in remote across a country. Security cameras at retail stores. IoT sensors across factory floors. And moving that data to the cloud for processing? It's expensive, slow, and often restricted by compliance.David is an expert when it comes to solving distribution problems. He led Kubernetes product at Google, co-founded Kubeflow to bring ML to production, and now he's building Expanso to tackle a difficult constraint: when your data can't move, how do you process it where it lives?We discuss:- The need for distributed data orchestration-Upstream data control: filtering and transforming at the source- Three forces making edge computing inevitable (physics, regulations, economics)- How to build successful open source infrastructure projects- Customer discovery and finding real pain points- His transition from Protocol Labs to founding Expanso- ETL pipelines: moving the first four steps closer to the data- Context loss and lineage in distributed systems- Processing 400,000 signals per second with 150MB agents- AI observability: attaching source metadata to training data- Running ML pipelines at the edge- Real-world deployment challenges (bandwidth, regulations, cost)Expanso is rethinking how we process data in an AI-native world—moving compute to data instead of data to compute. If you want to understand where distributed systems and edge computing are heading, this is a deep dive into the infrastructure layer beneath modern AI applications.General Podcast LinksWatch: https://www.youtube.com/@alexa_griffith Read: https://alexasinput.substack.com/ Listen: https://creators.spotify.com/pod/profile/alexagriffith/ More: https://linktr.ee/alexagriffithLearn more about the host atWebsite: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/Find out more about the guest atLinkedIn: https://www.linkedin.com/in/aronchick/ Twitter/X: https://x.com/aronchick GitHub: https://github.com/aronchick Expanso Website: https://expanso.io/ResourcesExpanso Website: https://expanso.io/ Kubernetes: https://kubernetes.io/ Kubeflow: https://www.kubeflow.org/ CNCF (Cloud Native Computing Foundation): https://www.cncf.io/ Protocol Labs: https://protocol.ai/KeywordsDavid Aronchick, Expanso, Kubernetes, Kubeflow, distributed systems, edge computing, data pipelines, ETL, upstream data control, Google Kubernetes Engine, open source, CNCF, observability, log processing, data lineage, provenance, schema enforcement, IoT, edge AI, distributed data, machine learning infrastructure, Protocol Labs, IPFS, Filecoin, data governance, compliance, GDPR, bandwidth optimization, data aggregation, AI infrastructure, multi-cloud, hybrid cloud, real-time processing
    Show More Show Less
    1 hr and 18 mins
  • How vLLM and llm-d Changed AI Inference with Rob Shaw
    Jun 3 2026
    In this episode of Alexa’s Input (AI), I sat down with Rob Shaw from Red Hat to talk about how AI inference evolved from a simple model serving problem into a large-scale distributed systems problem.We explored the infrastructure shifts behind modern LLM serving, including how vLLM and PagedAttention changed the economics and efficiency of inference, why KV cache management became one of the most important bottlenecks in production AI systems, and how orchestration layers like llm-d are emerging to coordinate distributed inference.We also discuss:how LLM inference differs from traditional model serving runtimesKV cache, prefix caching, and cache-aware routingwhy throughput and latency became major infrastructure challengeslong-context agents and repeated inference callsdistributed inference on Kubernetesintelligent routing, flow control, and load balancingprefill/decode disaggregationenterprise AI deployment realitiesvLLM has become one of the most important open-source projects in AI infrastructure, and llm-d represents a newer shift toward treating inference as a coordinated distributed system rather than just a single runtime problem.If you want to better understand the systems layer beneath modern AI applications, this episode is a deep dive into where inference infrastructure is heading next.General Podcast LinksWatch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠Learn more about the host atWebsite: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠Find out more about the guest at:LinkedIn: https://www.linkedin.com/in/robert-shaw-1a01399a/ Red Hat Articles: https://developers.redhat.com/author/robert-shawGithub: https://github.com/robertgshaw2-redhat ResourcesvLLM Website: https://vllm.ai/vLLM GitHub Repository: https://github.com/vllm-project/vllmllm-d Website: https://llm-d.ai/llm-d GitHub Repository - https://github.com/llm-d/llm-d KeywordsAI inference, VLLM, LMD, distributed inference, GPU optimization, open source AI, Kubernetes, multi-cluster deployment, AI infrastructure, enterprise AI AI infrastructure, Kubernetes, model optimization, speculative decoding, mixture of experts, AI deployment, performance tuning, AI systems, neural network scaling Key TopicsEvolution of vLLM and llm-dDistributed inference and routingGPU utilization and performance optimizationOpen source AI infrastructureEnterprise deployment challenges and solutions Standardization in Kubernetes for NIC exposurePerformance optimizations: quantization and speculative decodingMixture of experts architecture and parallelism strategiesFlow control and request scheduling in AI systemsEmerging hardware for AI inference, Cerebras processorReinforcement learning and AI system supportModular architecture of vLLM and ecosystem projects
    Show More Show Less
    1 hr and 43 mins
adbl_web_anon_alc_button_suppression_t1
No reviews yet