The Private AI Lab

Episodes

017 - SUSE AI Factory with NVIDIA Explained

Jun 30 2026

Most enterprise AI projects succeed as pilots. They fail on the way to production. In this episode, Rhys Oxenham, VP and General Manager of AI at SUSE, joins me to break down what SUSE announced at SUSECon 2026: SUSE AI Factory with NVIDIA.

We go deep on what it actually is, what problem it solves, and what happens when AI starts managing the infrastructure itself. We cover the assembly line concept, validated blueprints, why this isn't just for large enterprises, unified support across the full stack, digital sovereignty as resilience, and the very real governance challenge when agents start acting autonomously in your cluster, what I've been calling "YOLO ops." Release is July 2026. This is worth understanding before then.

🕐 CHAPTERS
00:00 — Intro
01:15 — Who is Rhys Oxenham and what happened at SUSECon 2026
03:19 — What is SUSE AI Factory with NVIDIA?
07:16 — The assembly line: what moves through it and where it starts and ends
10:24 — Blueprints explained: RAG, digital assistants, research agents
12:03 — ClickOps vs GitOps: who is each mode for?
15:10 — Scale: from single node to thousands of GPUs
17:29 — DGX Spark compatibility (teaser)
19:34 — Open source + enterprise support: one throat to choke
21:34 — Day-two operations: what does running it actually look like?
25:54 — Digital sovereignty: open source + NVIDIA — where does the openness live?
30:14 — MCP servers, agentic AI, and integrations with N8N and others
34:00 — YOLO ops: how do you govern autonomous agents in your cluster?
38:32 — Model selection: which model for which use case?
44:16 — AI ops blueprints: what's coming after launch
46:03 — Release date and what to expect in July 2026
48:56 — Wrap up and where to follow Rhys

🔗 LINKS
SUSE AI Factory with NVIDIA → https://www.suse.com
Rhys Oxenham on LinkedIn → https://www.linkedin.com/in/rhys-oxenham/
Johan on LinkedIn → https://www.linkedin.com/in/hojanThe Private AI Lab newsletter → https://www.linkedin.com/newsletters/the-private-ai-lab-7381951883810111489

🎙️ ABOUT THE GUEST
Rhys Oxenham is VP and General Manager of AI at SUSE, where he leads all AI product strategy and execution. He came up through solution architecture and field engineering before running SUSE's Edge and Telco Engineering groups, deploying infrastructure in air-gapped, industrial, and tactical environments. He keynoted SUSECon 2026 in Prague to announce the SUSE AI Factory with NVIDIA partnership.
🔔 SUBSCRIBE for weekly episodes on private AI, self-hosted infrastructure, and what it actually takes to run AI inside your organisation.
📨 Follow The Private AI Lab newsletter on LinkedIn for episode breakdowns and experiment write-ups.

Show More Show Less

53 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
016 - Nemotron 3 Ultra: NVIDIA’s Open-Weights Frontier Agent Brain (1M Context, 5x Faster)

Jun 12 2026

Johan breaks down NVIDIA’s ComputeEx 2026 announcement of Nemotron 3 Ultra 550B-A 55B, an open-weights mixture-of-experts model with 550B total parameters and 55B active, positioned as an orchestration “agent brain” for multi-step tasks behind the firewall. He reviews NVIDIA’s benchmarks versus GLM 5.1, Kimi K 2.6, and Qwen 3.5, highlighting best-in-class instruction following (82%), long-context performance (95%) with a 1M-token window, strong agent productivity (91%), and weaker coding results on TerminalBench versus Kimi. Johan emphasizes reported advantages in speed (~300 tokens/sec, ~5x faster), cost (up to ~30% cheaper on SWE-bench tests), and deployability via a unified NVFP4 checkpoint optimized for H100 and B200 GPUs, plus NemoClaw as the agent blueprint. He closes with an early-access demo comparing two agents researching Netherlands’ 2026 World Cup odds, showing Nemotron’s more granular path analysis and a 5.8% win estimate.00:00 Private AI Lab Intro01:19 Nemotron Ultra Explained02:22 Agent Brain Focus03:07 Benchmark Reality Check05:14 Speed And Cost Edge06:11 Training And Precision08:02 NeMo Claw Agents08:58 World Cup Agent Demo12:22 Why This Matters13:17 Wrap Up And Links

Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
015 - Meet Sparky: A Real-Life Jarvis with Alexis Gallagher

May 13 2026

I've been trying to build my own Jarvis for years. Then I met Alexis Gallagher at GTC — and Sparky is the closest thing I've seen.
Alexis is an AI researcher and developer, formerly at Answer AI and Google, now building something most people in AI aren't: a robot designed not just to be useful, but to be *alive*. Sparky lives on his desk in San Francisco. He initiates conversations. He develops his own evolving interests — eels, catenary arches, abandoned infrastructure. He knows who's in the room, when to speak, and when to stay quiet. And he noticed when it was Alexis's first Friday after leaving his job.
In this episode we go deep on the two design goals behind Sparky (useful and alive), the OpenClaw orchestration layer, the social awareness architecture running five times per second, the shared workspace principle that unlocks genuinely useful AI at a desk, and the tradeoffs between cascading and voice-to-voice architectures. We also do a live model switch mid-episode — from Claude Sonnet 4.6 to Nemotron 3 Super 120B running locally on a DGX Spark. It goes impressively well. Until it doesn't. That's in there too.

Guest
Alexis Gallagher — AI researcher and creator of Sparky
🌐 myrobotSparky.com
🔗 https://www.linkedin.com/in/alexis-gallagher/

Key topics covered
- The two design goals: useful AND alive — and why "alive" is the one almost nobody builds for
- How Sparky develops and evolves
- The social awareness stack
- What OpenClaw enables
- The shared workspace principle
- Cascading architecture (STT → LLM → TTS) vs voice-to-voice — the intelligence tradeoff
- Hardware: Reachy Mini Lite, RTX 3090, DGX Spark, Raspberry Pi — the full spectrum
- Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 120B (the Flowers for Algernon moment)
- The future of personal AI — why embodied social presence is the natural human interface

Chapters

```
00:00 Introduction
00:39 Who is Alexis Gallagher?
01:04 The pivotal AI moment: speech recognition in 2015
03:14 Science fiction to reality — where are the talking robots?
04:22 Sparky introduces himself (live on air)
05:33 The two design goals: useful and alive
07:02 How Sparky initiates conversations — and why that changes everything
08:10 Organic interests: how Sparky evolves what he cares about
09:48 OpenClaw as orchestration layer — soul.md and body control
12:55 Defining a custom robot node type in OpenClaw
15:26 Social awareness: face detection, diarization, presence sensing
16:15 Hardware options: Linux, RTX 3090, DGX Spark, Raspberry Pi
18:25 The Reachy Mini Lite kit — and why it's better than building a drone
19:40 Where to find Alexis and join the Discord
20:10 One eye, four ears — Sparky's hardware explained
24:25 What OpenClaw enables that other frameworks don't
28:13 "Do you have a body, or are you a body?" — a live philosophical exchange
31:17 Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super
33:01 The shared workspace principle — implicit shared attention
38:04 Orchestration in practice: Emacs, sub-agents, cross-platform
40:11 Cascading vs voice-to-voice architecture — the real tradeoff
42:15 Designing Sparky's voice (and the 1930s experiment)
44:12 What's genuinely useful day-to-day — two real examples
48:47 Nemotron 3 Super live — impressive, then the context window
53:38 The model Sparky was running before (Claude Sonnet 4.6)
54:03 Five years out: the future of personal AI companions
58:14 The closest thing to Jarvis I've ever seen
01:00:22 What's coming next — how fast the pieces are moving
01:02:16 Where to find Alexis and join the community
```

Links
- Sparky project and Discord: https://myrobotSparky.com
- Reachy Mini Lite: https://huggingface.co/reachy-mini

The Private AI Lab is hosted by Johan van Amersfoort — Chief Evangelist and AI Lead at ITQ.

📬 Newsletter: https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7381951883810111489
📝 Blog: https://johan.ml
🔗 LinkedIn: https://www.linkedin.com/in/hojan

Show More Show Less

1 hr and 5 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
014 - Project Q9: Where Robotics and AI meet (with Sander Harrewijnen)

Apr 30 2026

In this episode, Johan is joined by long-time colleague Sander Hardewijnen to pull back the curtain on Project Q9 — an ambitious internal project at ITQ that combines a Unitree Go 2 Pro robotic dog, private AI, computer vision, and modern cloud-native development practices.
From gesture recognition trained on 30,000 hand images to a Skynet-obsessed dog posting on LinkedIn, this episode is a deep dive into what happens when you give great engineers a suitcase full of robot and say, "see where it goes."
The conversation also covers the state of open-source AI coding assistants (OpenClaw vs NemoClaw), the realities of vibe coding in a production context, and what partner platforms like Red Hat OpenShift AI and SUSE AI actually enable beyond conversational AI.

Sander's blog: https://harre.dev
Q9's LinkedIn page: https://www.linkedin.com/in/q9-the-dog-2206863b1/

Chapters
00:00 Welcome & Introduction01:20 Icebreaker: Best AI Fail02:12 NemoClaw vs OpenClaw: Security & Sandboxing04:49 Running OpenClaw in an Isolated VLAN05:32 OpenClaw as a Personal Assistant: Home Assistant, News & Efteling API09:11 OpenClaw in the ITQ WhatsApp Group11:10 Introducing Project Q913:22 Why Robotics + Cloud-Native + AI?16:16 Technical Anatomy of Q918:30 Partner Platform Showcase: Broadcom, Red Hat & SUSE19:20 Debunking the GPU Myth23:05 Building the Gesture Recognition Model25:00 Training Progression: Epochs, Accuracy & Landmarks30:21 Hand Landmark Detection & the Gesture Pipeline32:34 Crowd Reactions at KubeCon33:57 Fine-Tuning vs Training From Scratch36:16 Use Case 2: Q9's LLM-Powered LinkedIn Persona40:41 Running LLMs on Partner Inference Platforms42:26 What's Next for Q9?43:44 Digital Twins in NVIDIA Omniverse + ROS245:10 Key Takeaways48:53 Responsible Vibe Coding49:58 Open-Sourcing Q9 — Coming Soon

Show More Show Less

51 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
013 - AI Resource Management Update & Tools with Frank Denneman

Apr 16 2026
In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:

👉 Resource management for GPU workloads

Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.
We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.

🔍 What you’ll learn in this episode

Why GPU workloads behave fundamentally differently from CPU/memory workloads
What GPU fragmentation really is (and why it kills utilization)
The difference between same-size vs mixed-mode placement
How placement IDs turn GPU scheduling into “Tetris”
Why “right-sizing” beats “perfect fitting” in AI environments
How to design a GPU profile catalog that actually scales
The role of state, agents, and storage in next-gen AI platforms

🔧 Tools & Resources mentioned

Frank created practical tools to help you design and validate your GPU environments:

👉 vGPU Silo Capacity Calculator
https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/
👉 Same-size vs Mixed-mode Placement Tool
https://frankdenneman.ai/tools/same-size-vs-mixed-mode/
👉 Deep dive on unified memory & modern AI workloads
https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/

Chapters:
00:00 Intro — Frank Denneman returns
01:30 AI hype vs real engineering
03:00 DGX Spark, NemoClaw & local AI agents
10:30 From LLMs to agents & stateful systems
12:00 Why AI infrastructure is different
15:00 What is GPU fragmentation?
19:30 Same-size vs mixed-mode placement
23:00 GPU “Tetris” and placement IDs explained
27:00 Right-sizing vs perfect fitting
32:00 The tools: capacity & placement simulation
36:00 GPU silos vs stranded capacity
41:00 Model sizing, KV cache & dynamic usage
48:00 Future of AI: smaller models & orchestration
55:00 AI-assisted coding & real-world impact
59:00 Key lessons learned
01:02:00 Closing thoughts
Show More Show Less
1 hr and 3 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
012 - From Sepsis to Sovereign Cloud: OpenShift AI in Healthcare (with Vincent Tsugranes)

Apr 2 2026
AI in healthcare didn’t start with ChatGPT.
Long before generative AI, hospitals were using machine learning for sepsis detection, imaging diagnostics, and predictive analytics. In this episode of The Private AI Lab, Johan sits down with Vincent Tsugranes, Chief Architect at Red Hat, to explore what’s real, what’s hype, and why platform matters more than ever.

They discuss:

Why 95% of AI projects fail
The evolution from OpenShift Data Science to OpenShift AI
Models-as-a-Service inside hospitals
vLLM vs LLMD for large-scale inference
Guardrails, hallucinations, and enterprise risk
Sovereign cloud and why healthcare is moving on-prem again
What “ambient AI” might mean in the next 12 months

This episode is for architects, platform engineers, healthcare IT leaders, and anyone building private AI in regulated environments.

00:00 – Red lights & farming with AI
02:10 – The first AI spark moment
04:00 – When “AI” became AI (ChatGPT moment)
07:20 – Why 95% of AI projects fail
11:00 – Machine learning vs modern AI
13:30 – Platform vs point solutions
16:00 – The history of OpenShift AI
19:00 – What is OpenShift AI under the hood?
22:00 – Hardware enablement & NVIDIA
25:00 – vLLM explained
27:30 – LLMD and distributed inference
30:00 – Healthcare use cases (sepsis, imaging, insurance)
33:00 – Models-as-a-Service inside hospitals
36:00 – Guardrails & hallucination risks
39:00 – Observability & FinOps explosion
42:00 – OpenShift 5 and platform intelligence
44:30 – Sovereign cloud in healthcare
48:00 – The future: ambient AI & rising power bills
Show More Show Less
51 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
011 - Open Source AI Just Leveled Up — Meet NVIDIA Nemotron Super

Mar 26 2026
Recorded live at NVIDIA GTC 2026, this episode dives into one of the biggest announcements in open AI: Nemotron Super.

Together with Joey Conway, we explore how NVIDIA is pushing open source AI forward — with models that are not only powerful, but also efficient and enterprise-ready.

We discuss:

The evolution from Llama-based models to Nemotron
Why reasoning + agentic capabilities matter
How NVIDIA balances performance and efficiency
What NVFP4 means for running AI locally
And why this could be a turning point for AI behind the firewall

Chapters
00:00 Intro
01:56 Welcome
02:37 GTC insights
03:31 Nemotron buzz
04:53 Model evolution
07:14 Core design principles
09:05 Reasoning capabilities
10:52 Scaling challenges
12:00 Architecture deep dive
13:12 Performance improvements
14:14 Quantization strategy
15:39 NVFP4 explained
16:16 DGX Spark use case
18:23 Broader adoption
19:37 Agentic AI impact
21:25 Try it yourself
22:03 Outro

Links
Try Nemotron: https://build.nvidia.com
More episodes: https://johan.ml
Show More Show Less
21 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
010 - Open Source AI at NVIDIA GTC (with Rhys Oxenham and Sanjeet Singh from SUSE)

Mar 12 2026
Open source is becoming one of the most important forces in AI.

In this episode of The Private AI Lab, Johan speaks with Rhys Oxenham and Sanjeet Singh from SUSE about the role of open source in building enterprise AI platforms.

They explore:

The difference between open source AI infrastructure and open-weight models
Why enterprises are moving toward private AI deployments
The growing importance of digital sovereignty
Innovation happening in the open source AI ecosystem
Why specialized models may challenge large frontier models
How SUSE helps organizations deploy AI platforms securely

The episode also previews NVIDIA GTC, where open source AI is a major theme.
All Open Source AI sessions in the content catalog:
https://www.nvidia.com/gtc/session-catalog/?search=open%20source
Register for NVIDIA GTC today using the following link:
https://nvda.ws/4qXGFjm
Show More Show Less
36 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free

Episodes

017 - SUSE AI Factory with NVIDIA Explained

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

016 - Nemotron 3 Ultra: NVIDIA’s Open-Weights Frontier Agent Brain (1M Context, 5x Faster)

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

015 - Meet Sparky: A Real-Life Jarvis with Alexis Gallagher

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

014 - Project Q9: Where Robotics and AI meet (with Sander Harrewijnen)

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

013 - AI Resource Management Update & Tools with Frank Denneman

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

012 - From Sepsis to Sovereign Cloud: OpenShift AI in Healthcare (with Vincent Tsugranes)

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

011 - Open Source AI Just Leveled Up — Meet NVIDIA Nemotron Super

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

010 - Open Source AI at NVIDIA GTC (with Rhys Oxenham and Sanjeet Singh from SUSE)

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed