#38: How Apache Sedona Solved Big Data’s Hardest Problem with Jia Yu
Failed to add items
Add to basket failed.
Add to wishlist failed.
Remove from wishlist failed.
Adding to library failed
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
Large Language Models can write poetry and debug code, but they still don't understand the fundamental physics of the real world. Ask an AI to find the "nearest restaurant" to a specific coordinate, and it struggles because it lacks Spatial Intelligence.
In this episode, we sit down with Jia Yu, the co-creator of Apache Sedona and co-founder of Wherobots, to discuss why geospatial data breaks standard big data engines and how he built the solution that now powers over 2 million downloads a month.
We trace the 10-year journey from a PhD research paper to a top-level Apache project, diving into the deep technical challenges of distributed computing. Jia explains why spatial data requires a completely different architecture than standard text or numbers and how the industry is finally moving toward a "Spatial Lakehouse" to break down data silos.
In this episode, we explore:
- The "Multimodality" Trap: Why mixing vector, raster, and LiDAR data crashes traditional systems.
- How SedonaDB is bringing massive scale to single-node machines (so you don't always need a cluster).
- The hardest problem in distributed computing - How to split a map across 1,000 servers without breaking the data.
- The multi-year fight to get native geometry support into Apache Iceberg.
- Why the next generation of models must evolve from text-based to spatially intelligent.
✅ Sign Up for Wherobots: https://wherobots.com/
✅ Learn more about Apache Sedona: https://wherobots.com/apache-sedona/
✅ What is Apache Sedona: https://wherobots.com/blog/what-is-apache-sedona/
✅ Test out SedonaDB: https://sedona.apache.org/sedonadb/latest/
✅ Connect with Jia on LinkedIn: https://www.linkedin.com/in/dr-jia-yu/
00:00:00 - Intro & Welcome
00:00:51 - The Origin Story: From GeoSpark to Apache Sedona
00:06:03 - Why Geospatial Data is "Special" (The Multimodality Problem)
00:09:47 - When to Move to Distributed Computing?
00:13:21 - The Secret to Maintaining a Vibrant Open Source Community
00:18:11 - The Features That Drove Adoption: Spatial SQL & Python
00:22:35 - Deep Dive: How Spatial Partitioning Works
00:28:57 - Why Build a Cloud-Native Platform?
00:33:05 - The Rise of the Spatial Lakehouse & Apache Iceberg
00:40:17 - Introducing SedonaDB: A Single-Node Engine
00:45:10 - The Future: Why AI Needs Spatial Intelligence
00:48:44 - Advice for Getting Started with Spatial Engineering
📰 Daily modern GIS insights: https://forrest.nyc
CONNECT WITH ME
📸 Instagram: https://www.instagram.com/matt_forrest/
💼 LinkedIn: https://www.linkedin.com/in/mbforr/
📧 Newsletter: https://forrest.nyc
🌐 Website: https://forrest.nyc