ninjaai.com
If you've spent any time with modern Large Language Models (LLMs) like ChatGPT, you've likely experienced a mix of awe and frustration. One moment, it's generating brilliant code or a perfect email; the next, it's confidently making up facts ("hallucinating") or getting stuck on a task that requires multiple steps. We've been conditioned to look for the next, bigger model—GPT-5, GPT-6, and beyond—as the solution to these problems.
But while we're watching for a bigger AI brain, a quieter, more fundamental revolution is already underway. The most significant gains in AI performance are coming not from raw model power, but from a radical shift in how we ask models to work. We are moving away from asking an AI for a single, perfect answer and toward giving it a smarter, more human-like process to find that answer.
This is the rise of "agentic workflows." This post distills three powerful takeaways about this shift, drawing from insights by AI leader Andrew Ng, a deep-dive into Saarthi, a pioneering AI Formal Verification Engineer, and OpenAI's leaked strategic roadmap.
Smarter Process, Stronger Performance
The core difference is between a "non-agentic" (or zero-shot) workflow and an "agentic" one. A non-agentic workflow is what most of us do today: we give the LLM a prompt and it generates an answer in one go. This, as the authors of the Saarthi paper describe it, is like asking someone to "type an essay from start to finish without ever using backspace." While LLMs are remarkably good at this, the quality has a ceiling.
An agentic workflow, by contrast, mimics how a human actually works. It breaks a task down: outlining, researching, drafting, and revising. The AI doesn't just give a single answer; it follows a process of iterative refinement to get to a much better answer.
The most counter-intuitive evidence of this comes from performance benchmarks. The performance lift is so significant that, as highlighted in the Saarthi paper, a less powerful model like GPT-3.5 wrapped in an agentic workflow can outperform the more powerful GPT-4 using a standard, one-shot prompt.
This isn't just theoretical. The "Saarthi" paper, which details an AI formal verification engineer, provides a concrete example. When tasked with formally verifying a synchronous FIFO design, the results were stark:
- Non-agentic (zero-shot) approach: Proved only 42.85% of assertions.
- Agentic (few-shot) approach: Proved 100% of assertions.
This is a profound insight. It means the future of AI progress isn't just about the expensive and time-consuming process of building ever-larger models. It's about designing smarter systems around them—systems that give AI the room to think, iterate, correct itself, and reason through problems. This performance leap begs the question: what does a 'smarter system' actually look like? The answer isn't a single, monolithic AI, but rather a team of them.
From Soloist to Symphony: AI Works in Teams
This new paradigm relies on specific design patterns that directly mimic a high-functioning human team, addressing the core weaknesses of a single LLM. There are four primary patterns emerging:
- Reflection: An AI "coder" generates work while an AI "critic" reviews it, providing feedback for iterative improvement. This creates a built-in quality control loop.
- Tool Use: The AI agent is given the ability to call on external, specialized tools. This could be as simple as making an API call to search the web or as complex as leveraging specialized computer vision models.
- Planning: Before executing, the AI first breaks down a complex task into a logical sequence of smaller, manageable steps. This "Chain-of-Thought" approach prevents the model from getting lost and ensures a more structured path to a solution.