[REDACTED] Episode 4: We Stopped Using Claude Code Mid-Build. Here's What We Built Instead

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

[REDACTED] Episode 4: We Stopped Using Claude Code Mid-Build. Here's What We Built Instead

Listen for free

View show details

Redacted is the show that doesn't clean things up before hitting record. Episode 4 is a double build session: Taylor Cotner walks through the multi-agent HubSpot cleanup pipeline he's been iterating on for weeks, now running on the Anthropic SDK with Claude Code out of the loop, and David Shaner demos how he used Claude Design and Claude Code to rebuild Offline's partner landing page from scratch. Most of the episode is screen sharing, so pull it up on YouTube.What We CoverNo Claude Code in the loop: Taylor stopped using Claude Code as an agent orchestrator in his HubSpot pipeline, not as his coding tool (he’s still building the app with Claude Code), but as a decision-maker in the middle of a workflow. Removing it gave him full control over inputs and outputs at every step.Custom eval system built from scratch: Taylor built an eval page that looks like an Excel grid, models as columns, test cases as rows , to measure Haiku, Sonnet, GPT-5, and GPT-5 Mini against real messy HubSpot data. Each cell shows pass, fail, and cost.GPT-5 Mini at 10–20× less cost: For the lead qualifier agent, Sonnet evals cost $1.00 per run. GPT-5 Mini costs $0.05. “I can live with that. 10X less the cost.” For the core cleanup evals: $1.50 for Sonnet versus $0.14 on GPT-5 Mini.$20 for 133 million tokens overnight: Using the Vercel AI Gateway — which lets you swap any model without changing your code, Taylor ran 200 HubSpot restaurant cleanups in a single night for $20 total.Self-grading pipeline: The pipeline grades its own output after every cleanup run. If a job comes back below an A, it automatically spawns a new run with Sonnet, no human catch required. A B grade on 101 Craft Kitchen auto-escalated and came back with an A.Real mess-ups make the best evals: Almost every eval case came from a real HubSpot error. The system once tried to create a “Kim company” to link a group of unrelated restaurants, so Taylor added an eval to teach it that being linked by an owner contact is not the same as being linked by company structure.The conveyor belt metaphor: David’s landing page pipeline starts with live sales transcripts from Steve (Offline’s seller) and is designed to end with a generated, voice-of-customer partner landing page. “In an ideal world, I’ve got a black box in the middle.”Claude Design → Claude Code handoff: Claude Design’s share feature generates a markdown handoff document with a file map, token contract, and panel build notes. When Claude Code picks up the project, it reads this file first, bridging design intent to implementation.One person, 7–8 hats replaced: David processed customer reviews, tightened company positioning, built wireframes, designed mobile experiences, wrote code, and is about to ship a pull request, all without a designer, copywriter, or front-end developer.GPT-5 “overthinks”: Their working theory is that GPT-5 (not Mini) gets weird things wrong because it goes too abstract. The temperature/Myers-Briggs analogy, literal versus creative thinking, might explain why Mini outperforms the full model on structured cleanup tasks.The iceberg: Once the cleanup and landing page are done, the plan is to surface above the water: automated emails, Instagram DMs, and a fully AI-run lead generation function operating on top of the clean data.Time Stamps0:00 Cold open: AI temperature, Myers-Briggs, and model thinking styles0:47 Welcome to Redacted — Episode 41:39 What is Redacted? Show premise and audience2:30 Offline: $1M ARR, 2 full-time employees3:41 Why watch on YouTube (screen-share heavy)4:51 Taylor's segment: the Offline HubSpot AI cleanup project6:31 Ditching Claude Code as orchestrator — using the SDK directly9:21 Building a custom eval system from scratch10:37 Vercel AI Gateway: comparing Sonnet, Haiku, and GPT-5 Mini11:28 GPT-5 Mini at 10x less cost — "I can live with that"14:26 The agent pipeline: planner → reconciler → grader17:49 Evals built from real HubSpot failures23:09 $20 for 133 million tokens overnight24:07 Self-escalation: B grades trigger an automatic Sonnet re-run25:02 Lead gen expander and qualifier agents26:09 The iceberg: what comes after cleanup27:35 David's segment: transcripts-to-landing-page pipeline29:29 Claude Design: wireframing panel by panel33:13 Mobile wireframe inside an iPhone frame + handoff to Claude Code35:21 Live demo: localhost with real Offline API data38:47 One person, 7–8 hats replaced39:46 Wrap up and where to find David & TaylorShow notes from the episode: https://github.com/instanttaylor/redacted-podcastWhere to Find David:LinkedIn: https://www.linkedin.com/in/davidshaner/Where to Find Taylor:LinkedIn: https://www.linkedin.com/in/taylorcotner/More about Offline: https://www.linkedin.com/company/offline-media-inc-/--- This episode of Redacted is hosted by David Shaner and Taylor Cotner, and presented and produced by NC Tweener Fund.We couldn’t share posts like this without our amazing sponsors: Platinum: NC ...

No reviews yet