Episodes

  • 017 - SUSE AI Factory with NVIDIA Explained
    Jun 30 2026

    Most enterprise AI projects succeed as pilots. They fail on the way to production. In this episode, Rhys Oxenham, VP and General Manager of AI at SUSE, joins me to break down what SUSE announced at SUSECon 2026: SUSE AI Factory with NVIDIA.


    We go deep on what it actually is, what problem it solves, and what happens when AI starts managing the infrastructure itself. We cover the assembly line concept, validated blueprints, why this isn't just for large enterprises, unified support across the full stack, digital sovereignty as resilience, and the very real governance challenge when agents start acting autonomously in your cluster, what I've been calling "YOLO ops." Release is July 2026. This is worth understanding before then.


    🕐 CHAPTERS

    00:00 — Intro

    01:15 — Who is Rhys Oxenham and what happened at SUSECon 2026

    03:19 — What is SUSE AI Factory with NVIDIA?

    07:16 — The assembly line: what moves through it and where it starts and ends

    10:24 — Blueprints explained: RAG, digital assistants, research agents

    12:03 — ClickOps vs GitOps: who is each mode for?

    15:10 — Scale: from single node to thousands of GPUs

    17:29 — DGX Spark compatibility (teaser)

    19:34 — Open source + enterprise support: one throat to choke

    21:34 — Day-two operations: what does running it actually look like?

    25:54 — Digital sovereignty: open source + NVIDIA — where does the openness live?

    30:14 — MCP servers, agentic AI, and integrations with N8N and others

    34:00 — YOLO ops: how do you govern autonomous agents in your cluster?

    38:32 — Model selection: which model for which use case?

    44:16 — AI ops blueprints: what's coming after launch

    46:03 — Release date and what to expect in July 2026

    48:56 — Wrap up and where to follow Rhys


    🔗 LINKS

    SUSE AI Factory with NVIDIA → https://www.suse.com

    Rhys Oxenham on LinkedIn → https://www.linkedin.com/in/rhys-oxenham/

    Johan on LinkedIn → https://www.linkedin.com/in/hojanThe Private AI Lab newsletter → https://www.linkedin.com/newsletters/the-private-ai-lab-7381951883810111489


    🎙️ ABOUT THE GUEST

    Rhys Oxenham is VP and General Manager of AI at SUSE, where he leads all AI product strategy and execution. He came up through solution architecture and field engineering before running SUSE's Edge and Telco Engineering groups, deploying infrastructure in air-gapped, industrial, and tactical environments. He keynoted SUSECon 2026 in Prague to announce the SUSE AI Factory with NVIDIA partnership.

    🔔 SUBSCRIBE for weekly episodes on private AI, self-hosted infrastructure, and what it actually takes to run AI inside your organisation.

    📨 Follow The Private AI Lab newsletter on LinkedIn for episode breakdowns and experiment write-ups.

    Show More Show Less
    53 mins
  • 016 - Nemotron 3 Ultra: NVIDIA’s Open-Weights Frontier Agent Brain (1M Context, 5x Faster)
    Jun 12 2026

    Johan breaks down NVIDIA’s ComputeEx 2026 announcement of Nemotron 3 Ultra 550B-A 55B, an open-weights mixture-of-experts model with 550B total parameters and 55B active, positioned as an orchestration “agent brain” for multi-step tasks behind the firewall. He reviews NVIDIA’s benchmarks versus GLM 5.1, Kimi K 2.6, and Qwen 3.5, highlighting best-in-class instruction following (82%), long-context performance (95%) with a 1M-token window, strong agent productivity (91%), and weaker coding results on TerminalBench versus Kimi. Johan emphasizes reported advantages in speed (~300 tokens/sec, ~5x faster), cost (up to ~30% cheaper on SWE-bench tests), and deployability via a unified NVFP4 checkpoint optimized for H100 and B200 GPUs, plus NemoClaw as the agent blueprint. He closes with an early-access demo comparing two agents researching Netherlands’ 2026 World Cup odds, showing Nemotron’s more granular path analysis and a 5.8% win estimate.00:00 Private AI Lab Intro01:19 Nemotron Ultra Explained02:22 Agent Brain Focus03:07 Benchmark Reality Check05:14 Speed And Cost Edge06:11 Training And Precision08:02 NeMo Claw Agents08:58 World Cup Agent Demo12:22 Why This Matters13:17 Wrap Up And Links

    Show More Show Less
    14 mins
  • 015 - Meet Sparky: A Real-Life Jarvis with Alexis Gallagher
    May 13 2026

    I've been trying to build my own Jarvis for years. Then I met Alexis Gallagher at GTC — and Sparky is the closest thing I've seen.

    Alexis is an AI researcher and developer, formerly at Answer AI and Google, now building something most people in AI aren't: a robot designed not just to be useful, but to be *alive*. Sparky lives on his desk in San Francisco. He initiates conversations. He develops his own evolving interests — eels, catenary arches, abandoned infrastructure. He knows who's in the room, when to speak, and when to stay quiet. And he noticed when it was Alexis's first Friday after leaving his job.

    In this episode we go deep on the two design goals behind Sparky (useful and alive), the OpenClaw orchestration layer, the social awareness architecture running five times per second, the shared workspace principle that unlocks genuinely useful AI at a desk, and the tradeoffs between cascading and voice-to-voice architectures. We also do a live model switch mid-episode — from Claude Sonnet 4.6 to Nemotron 3 Super 120B running locally on a DGX Spark. It goes impressively well. Until it doesn't. That's in there too.


    Guest

    Alexis Gallagher — AI researcher and creator of Sparky

    🌐 myrobotSparky.com

    🔗 https://www.linkedin.com/in/alexis-gallagher/


    Key topics covered

    - The two design goals: useful AND alive — and why "alive" is the one almost nobody builds for

    - How Sparky develops and evolves

    - The social awareness stack

    - What OpenClaw enables

    - The shared workspace principle

    - Cascading architecture (STT → LLM → TTS) vs voice-to-voice — the intelligence tradeoff

    - Hardware: Reachy Mini Lite, RTX 3090, DGX Spark, Raspberry Pi — the full spectrum

    - Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 120B (the Flowers for Algernon moment)

    - The future of personal AI — why embodied social presence is the natural human interface


    Chapters


    ```

    00:00 Introduction

    00:39 Who is Alexis Gallagher?

    01:04 The pivotal AI moment: speech recognition in 2015

    03:14 Science fiction to reality — where are the talking robots?

    04:22 Sparky introduces himself (live on air)

    05:33 The two design goals: useful and alive

    07:02 How Sparky initiates conversations — and why that changes everything

    08:10 Organic interests: how Sparky evolves what he cares about

    09:48 OpenClaw as orchestration layer — soul.md and body control

    12:55 Defining a custom robot node type in OpenClaw

    15:26 Social awareness: face detection, diarization, presence sensing

    16:15 Hardware options: Linux, RTX 3090, DGX Spark, Raspberry Pi

    18:25 The Reachy Mini Lite kit — and why it's better than building a drone

    19:40 Where to find Alexis and join the Discord

    20:10 One eye, four ears — Sparky's hardware explained

    24:25 What OpenClaw enables that other frameworks don't

    28:13 "Do you have a body, or are you a body?" — a live philosophical exchange

    31:17 Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super

    33:01 The shared workspace principle — implicit shared attention

    38:04 Orchestration in practice: Emacs, sub-agents, cross-platform

    40:11 Cascading vs voice-to-voice architecture — the real tradeoff

    42:15 Designing Sparky's voice (and the 1930s experiment)

    44:12 What's genuinely useful day-to-day — two real examples

    48:47 Nemotron 3 Super live — impressive, then the context window

    53:38 The model Sparky was running before (Claude Sonnet 4.6)

    54:03 Five years out: the future of personal AI companions

    58:14 The closest thing to Jarvis I've ever seen

    01:00:22 What's coming next — how fast the pieces are moving

    01:02:16 Where to find Alexis and join the community

    ```


    Links

    - Sparky project and Discord: https://myrobotSparky.com

    - Reachy Mini Lite: https://huggingface.co/reachy-mini


    The Private AI Lab is hosted by Johan van Amersfoort — Chief Evangelist and AI Lead at ITQ.


    📬 Newsletter: https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7381951883810111489

    📝 Blog: https://johan.ml

    🔗 LinkedIn: https://www.linkedin.com/in/hojan

    Show More Show Less
    1 hr and 5 mins
  • 014 - Project Q9: Where Robotics and AI meet (with Sander Harrewijnen)
    Apr 30 2026

    In this episode, Johan is joined by long-time colleague Sander Hardewijnen to pull back the curtain on Project Q9 — an ambitious internal project at ITQ that combines a Unitree Go 2 Pro robotic dog, private AI, computer vision, and modern cloud-native development practices.

    From gesture recognition trained on 30,000 hand images to a Skynet-obsessed dog posting on LinkedIn, this episode is a deep dive into what happens when you give great engineers a suitcase full of robot and say, "see where it goes."

    The conversation also covers the state of open-source AI coding assistants (OpenClaw vs NemoClaw), the realities of vibe coding in a production context, and what partner platforms like Red Hat OpenShift AI and SUSE AI actually enable beyond conversational AI.


    Sander's blog: https://harre.dev

    Q9's LinkedIn page: https://www.linkedin.com/in/q9-the-dog-2206863b1/


    Chapters

    00:00 Welcome & Introduction01:20 Icebreaker: Best AI Fail02:12 NemoClaw vs OpenClaw: Security & Sandboxing04:49 Running OpenClaw in an Isolated VLAN05:32 OpenClaw as a Personal Assistant: Home Assistant, News & Efteling API09:11 OpenClaw in the ITQ WhatsApp Group11:10 Introducing Project Q913:22 Why Robotics + Cloud-Native + AI?16:16 Technical Anatomy of Q918:30 Partner Platform Showcase: Broadcom, Red Hat & SUSE19:20 Debunking the GPU Myth23:05 Building the Gesture Recognition Model25:00 Training Progression: Epochs, Accuracy & Landmarks30:21 Hand Landmark Detection & the Gesture Pipeline32:34 Crowd Reactions at KubeCon33:57 Fine-Tuning vs Training From Scratch36:16 Use Case 2: Q9's LLM-Powered LinkedIn Persona40:41 Running LLMs on Partner Inference Platforms42:26 What's Next for Q9?43:44 Digital Twins in NVIDIA Omniverse + ROS245:10 Key Takeaways48:53 Responsible Vibe Coding49:58 Open-Sourcing Q9 — Coming Soon

    Show More Show Less
    51 mins
  • 013 - AI Resource Management Update & Tools with Frank Denneman
    Apr 16 2026

    In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:


    👉 Resource management for GPU workloads


    Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.

    We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.


    🔍 What you’ll learn in this episode


    • Why GPU workloads behave fundamentally differently from CPU/memory workloads

    • What GPU fragmentation really is (and why it kills utilization)

    • The difference between same-size vs mixed-mode placement

    • How placement IDs turn GPU scheduling into “Tetris”

    • Why “right-sizing” beats “perfect fitting” in AI environments

    • How to design a GPU profile catalog that actually scales

    • The role of state, agents, and storage in next-gen AI platforms


    🔧 Tools & Resources mentioned


    Frank created practical tools to help you design and validate your GPU environments:


    • 👉 vGPU Silo Capacity Calculator

      https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/

    • 👉 Same-size vs Mixed-mode Placement Tool

      https://frankdenneman.ai/tools/same-size-vs-mixed-mode/

    • 👉 Deep dive on unified memory & modern AI workloads

      https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/


    Chapters:

    00:00 Intro — Frank Denneman returns

    01:30 AI hype vs real engineering

    03:00 DGX Spark, NemoClaw & local AI agents

    10:30 From LLMs to agents & stateful systems

    12:00 Why AI infrastructure is different

    15:00 What is GPU fragmentation?

    19:30 Same-size vs mixed-mode placement

    23:00 GPU “Tetris” and placement IDs explained

    27:00 Right-sizing vs perfect fitting

    32:00 The tools: capacity & placement simulation

    36:00 GPU silos vs stranded capacity

    41:00 Model sizing, KV cache & dynamic usage

    48:00 Future of AI: smaller models & orchestration

    55:00 AI-assisted coding & real-world impact

    59:00 Key lessons learned

    01:02:00 Closing thoughts




    Show More Show Less
    1 hr and 3 mins
  • 012 - From Sepsis to Sovereign Cloud: OpenShift AI in Healthcare (with Vincent Tsugranes)
    Apr 2 2026

    AI in healthcare didn’t start with ChatGPT.

    Long before generative AI, hospitals were using machine learning for sepsis detection, imaging diagnostics, and predictive analytics. In this episode of The Private AI Lab, Johan sits down with Vincent Tsugranes, Chief Architect at Red Hat, to explore what’s real, what’s hype, and why platform matters more than ever.


    They discuss:


    • Why 95% of AI projects fail

    • The evolution from OpenShift Data Science to OpenShift AI

    • Models-as-a-Service inside hospitals

    • vLLM vs LLMD for large-scale inference

    • Guardrails, hallucinations, and enterprise risk

    • Sovereign cloud and why healthcare is moving on-prem again

    • What “ambient AI” might mean in the next 12 months


    This episode is for architects, platform engineers, healthcare IT leaders, and anyone building private AI in regulated environments.


    00:00 – Red lights & farming with AI

    02:10 – The first AI spark moment

    04:00 – When “AI” became AI (ChatGPT moment)

    07:20 – Why 95% of AI projects fail

    11:00 – Machine learning vs modern AI

    13:30 – Platform vs point solutions

    16:00 – The history of OpenShift AI

    19:00 – What is OpenShift AI under the hood?

    22:00 – Hardware enablement & NVIDIA

    25:00 – vLLM explained

    27:30 – LLMD and distributed inference

    30:00 – Healthcare use cases (sepsis, imaging, insurance)

    33:00 – Models-as-a-Service inside hospitals

    36:00 – Guardrails & hallucination risks

    39:00 – Observability & FinOps explosion

    42:00 – OpenShift 5 and platform intelligence

    44:30 – Sovereign cloud in healthcare

    48:00 – The future: ambient AI & rising power bills

    Show More Show Less
    51 mins
  • 011 - Open Source AI Just Leveled Up — Meet NVIDIA Nemotron Super
    Mar 26 2026

    Recorded live at NVIDIA GTC 2026, this episode dives into one of the biggest announcements in open AI: Nemotron Super.


    Together with Joey Conway, we explore how NVIDIA is pushing open source AI forward — with models that are not only powerful, but also efficient and enterprise-ready.


    We discuss:


    • The evolution from Llama-based models to Nemotron

    • Why reasoning + agentic capabilities matter

    • How NVIDIA balances performance and efficiency

    • What NVFP4 means for running AI locally

    • And why this could be a turning point for AI behind the firewall


    Chapters

    00:00 Intro

    01:56 Welcome

    02:37 GTC insights

    03:31 Nemotron buzz

    04:53 Model evolution

    07:14 Core design principles

    09:05 Reasoning capabilities

    10:52 Scaling challenges

    12:00 Architecture deep dive

    13:12 Performance improvements

    14:14 Quantization strategy

    15:39 NVFP4 explained

    16:16 DGX Spark use case

    18:23 Broader adoption

    19:37 Agentic AI impact

    21:25 Try it yourself

    22:03 Outro


    Links

    • Try Nemotron: https://build.nvidia.com

    • More episodes: https://johan.ml

    Show More Show Less
    21 mins
  • 010 - Open Source AI at NVIDIA GTC (with Rhys Oxenham and Sanjeet Singh from SUSE)
    Mar 12 2026

    Open source is becoming one of the most important forces in AI.


    In this episode of The Private AI Lab, Johan speaks with Rhys Oxenham and Sanjeet Singh from SUSE about the role of open source in building enterprise AI platforms.


    They explore:


    • The difference between open source AI infrastructure and open-weight models

    • Why enterprises are moving toward private AI deployments

    • The growing importance of digital sovereignty

    • Innovation happening in the open source AI ecosystem

    • Why specialized models may challenge large frontier models

    • How SUSE helps organizations deploy AI platforms securely


    The episode also previews NVIDIA GTC, where open source AI is a major theme.

    All Open Source AI sessions in the content catalog:

    https://www.nvidia.com/gtc/session-catalog/?search=open%20source

    Register for NVIDIA GTC today using the following link:

    https://nvda.ws/4qXGFjm




















    Show More Show Less
    36 mins