Angela Jiang, Head of Product for the Claude platform, and Katelyn Lesse, Head of Engineering, join Dan Shipper to explore the foundational work behind Anthropic's AI infrastructure. They discuss the groundbreaking Claude Managed Agents, features designed to allow users to provide an AI agent with an outcome and a budget, trusting it to independently achieve the specified goal.
The conversation highlights the intricate journey of evolving the Claude platform from a simple API to a robust system capable of orchestrating sophisticated agents around the clock. Jiang and Lesse reveal the engineering challenges and solutions involved in building an AI infrastructure that can support scalable, always-on agent operations, addressing the infrastructure walls that typically hinder most agent projects in production.
This development is pivotal for the future of AI, as it abstracts away complex engineering, making powerful, autonomous agents accessible and practical for a wider range of applications. By tackling the infrastructure bottleneck, Anthropic is enabling a paradigm shift where AI agents are goal-oriented, self-managing, and capable of handling intricate tasks, moving closer to a future where Claude can even write its own operational harness.
Key takeaways
- The core driver for platform evolution is the need to provide higher-order abstractions that enable users to achieve better outcomes and simplify the development of AI applications.
- Claude managed agents are founded on core primitives like the messages API, which includes tool use, sandboxed code execution, and web search.
- Developers must decide whether to build custom agent solutions from scratch or utilize managed agent platforms to save development time, facing a "time deflation" dynamic.
- Building agents for products at scale presents significant infrastructure challenges that a managed solution from the platform provider can alleviate.
- A primary concern with adopting cloud-managed agent solutions is the perceived risk of losing flexibility and control, potentially leading to vendor lock-in.
- The industry is moving from generic model harnesses to deeply integrated, model-specific harnesses designed to maximize individual model performance.
- "Harness engineering," which involves tailoring the harness to a specific model's nuances, can significantly improve its performance beyond what generic setups offer.
- This tight coupling creates path dependencies, potentially limiting a model's generalizability and specializing its capabilities based on the initial harness design.
- The real bottleneck for putting AI agents into production and scaling them is the underlying infrastructure.
- Infrastructure challenges include managing servers, storing transcript data, and implementing security.
- The platform aims to abstract away the technical infrastructure, enabling users to deploy agents without managing backend complexities.
- Companies are leveraging internal agents to build highly customized end-to-end development platforms that integrate seamlessly with their unique CI/CD workflows and development environments.
- Human-in-the-loop interaction is vital for legal compliance, necessitating collaborative interfaces and distinct sessions that extend beyond the capabilities of a single AI skill.
- Human ownership is crucial for preventing AI agents from becoming stale and ineffective after their initial deployment.
- Despite self-service capabilities, dedicated technical teams or "AI-pilled" individuals are still necessary to build and ensure the excellence of AI agents.
- Empowering non-technical teams to customize AI agents requires abstracting away direct code interaction to prevent infrastructure overhead and potential errors.
- Diverse agent orchestration strategies exist, including advisor models that separate execution from advice, and adversarial pairs where one agent critiques another's output.
- Agent success should be primarily measured by verifiable outcomes and budget adherence, rather than complex, domain-specific evaluations.
- Implementing self-upgrade skills can help agents automatically transition to newer models or architectures, simplifying their maintenance and longevity.
- In a year, Claude is expected to autonomously manage its own architecture and processes, simplifying development by focusing users on outcomes and budgets rather than complex prompt or harness engineering.
AI Platforms are Evolving to Offer Higher Abstractions for Better Outcomes
AI platforms have undergone significant evolution, starting from basic completion endpoints where users would send a prompt to receive a response. This initial phase, while groundbreaking at the time, is now seen as a rudimentary step in the platform's development.
The evolution progressed to include features like tool calling and chat sessions. Currently, with advancements such as Claude managed agents, platforms are offering more sophisticated capabilities, akin to providing a powerful AI agent on a computer equipped with memory and other advanced functionalities.
This shift is primarily driven by the continuous improvement of AI models and the imperative to deliver the best possible outcomes for users. As models become more autonomous and use cases become clearer, platforms are compelled to offer higher-order abstractions.
The goal is to simplify complex AI development, making it easier for users to build products and agents. This involves enriching the platform with robust components to make advanced AI capabilities as accessible as possible, catering to both cutting-edge experimenters and those seeking out-of-the-box solutions.
we find ourselves like basically needing to kind of like- Evolve the platform to be sort of like higher and higher order abstraction, but it's in the pursuit of like helping you get the best outcomes out of something.
The Primitives of Claude Managed Agents and the Developer's Dilemma
Claude managed agents leverage fundamental primitives available through the messages API, including built-in tools, sandboxed code execution, and web search capabilities. These components are integrated to enable sophisticated agent interactions and tasks.
Anthropic has engineered a specialized infrastructure that combines these potent primitives into a cohesive "harness," aiming to deliver the most effective outcomes from the Claude model. This setup streamlines the process of building complex agent behaviors.
Developers building their own agent-based products, like internal tools running Claude in a loop, often find themselves writing extensive custom code. This mirrors the functionalities that managed agent solutions aim to provide.
A significant dilemma arises for these developers: continue investing time in building custom, often lengthy, solutions or await more comprehensive managed agent platforms that could significantly reduce development effort. This tension creates a sense of "time deflation" where future time invested in adopting managed solutions could be more valuable than current custom development.
I'm sitting here feeling this sense of-- I've been thinking of it as like time deflation, like my time gets more valuable in the future.
Anthropic developed managed agents to solve internal infrastructure challenges and support scaling.
Anthropic decided to build cloud-managed agents because their internal teams repeatedly faced infrastructure challenges when developing and running autonomous agents in the cloud. They realized they were "done building this for ourselves" and wanted to offer a robust solution to others.
While smaller-scale agent projects might run on basic setups like Mac minis, integrating agents into products at a significant scale introduces complex infrastructure requirements. Anthropic's managed agents aim to alleviate these challenges for developers building production-ready applications.
Their platform design prioritizes modularity, allowing flexibility for users. However, Anthropic also maintains an opinionated approach to certain core components that are closely tied to the Claude model, such as how Claude specifically interacts with file systems to ensure optimal performance and integration.
part of why we ended up building cloud managed agents was because Anthropic ourselves had gone through enough of these iterations where we built products that were-- Agents that you could run autonomously in the cloud, and we did that, stand up the infrastructure so that it works well sort of work enough times that, we ourselves were like, 'Okay, we're done building this for ourselves.'
Developers express concern over losing flexibility with cloud-managed agents.
One significant concern for development teams considering managed agent solutions is the potential loss of flexibility. Many currently operate with highly customizable local environments, such as a Mac mini or a dedicated server, which allows them direct interaction with large language models like Claude.
This current setup gives developers full control, enabling them to easily pipe information to Claude and utilize its full range of capabilities. They also have direct access to local resources like a file system and a web browser, providing a comprehensive toolkit for their agent-based applications.
The apprehension stems from the possibility that transitioning to a cloud-managed agent service might remove this level of control. Developers fear that such a move could restrict their ability to freely adapt their tools and infrastructure, leading to vendor lock-in and reduced operational agility.
Deep Pairing of Model and Harness Creates Path Dependencies
Historically, developers often used generic harnesses that allowed them to easily swap out different large language models, treating them as interchangeable components. This approach provided flexibility and minimized vendor lock-in concerns.
However, with the emergence of a new generation of models, labs are developing models with increasingly distinct techniques and architectures. This has led to a shift away from generic harnesses towards a deep pairing, where the harness and the model are tightly integrated.
This "harness engineering" strategy focuses on optimizing the harness specifically for a given model to extract its maximum potential. Internal evaluations show that different harnesses can drastically alter a model's performance, indicating significant "alpha" in this tailored approach.
This deep pairing introduces path dependencies, meaning initial choices in harness design (e.g., handling requests, tool calls, file systems) can significantly influence a model's future trajectory and capabilities. This specialization might impact a model's generalizability, potentially locking it into specific strengths.
when you build agents... the harness and the model get very paired.
Claude Managed Agents are designed for internal company automation and platform development
The initial chat experience for Claude Managed Agents is structured to help both technical and non-technical users understand the core primitives and APIs. This quick start serves as an educational tool, making the complex setup process more approachable for a wider range of users, regardless of their technical background.
While the quick start aims for broad accessibility, the primary target audience for Claude Managed Agents is internal company teams. The goal is to enable these teams to build sophisticated automation solutions and develop powerful internal platforms tailored to their specific organizational needs.
Companies are utilizing Managed Agents to construct comprehensive, end-to-end software development platforms. Beyond large-scale platforms, they are also used for automating smaller, specific internal processes, such as streamlining operations within legal departments to enhance efficiency.
I want, you know, a full end-to-end software development platform, right? And like Managed Agents is the perfect solution for something like that.
The Infrastructure Wall in AI Agent Production
Many developers initially focus on the complexities of "harness engineering" when building AI agents, believing tasks like prompt caching and optimizing context windows are the main hurdles. Products like agent SDKs are designed to abstract away these challenges.
However, as teams attempt to move their agents into production, a different and more significant obstacle emerges: the infrastructure wall. This refers to the substantial challenges involved in operationalizing and scaling agents beyond initial development.
Overcoming this infrastructure wall requires addressing critical issues such as maintaining constantly running servers or leveraging dynamic scaling solutions. Additionally, robust systems are needed for storing agent transcript data and ensuring comprehensive security protocols are in place.
everybody hits an infrastructure wall.
Anthropic's Vision for One-Click Agent Deployment and Integration
The future vision for AI agents includes a strong emphasis on simplified deployment and seamless integration into daily workflows. The discussion highlights the desire for agents that can be set up with a single click, making them incredibly accessible for users.
A key example provided is the concept of a "one-click agent that lives in my Slack," which would remain "always on" within collaborative platforms. This approach aims to eliminate the need for users to manage complex technical infrastructure, a significant pain point in current agent deployments.
The underlying platform is intended to handle the intricacies of agent setup and customization, abstracting away the "technical infrastructure stuff." This strategic direction focuses on making agents user-friendly and readily available within existing communication tools.
a one-click agent that lives in my Slack
Internal Agents for End-to-End Development and Team Collaboration
Several companies, including Stripe with its Minions platform and Ramp, have developed internal end-to-end development platforms using agents. These systems often involve a thin layer of managed agents integrated with a company's unique development environment.
The primary advantage of these custom internal platforms over general AI tools like Claude Code is the ability to deeply customize the agent's environment and integrate it with specific internal processes, such as continuous integration and continuous deployment (CI/CD) pipelines. This allows for more tailored and comprehensive automation.
Beyond individual productivity, a significant opportunity for internal agents lies in facilitating team collaboration. While many tools enhance individual automation, complex processes often require multiple agents to interface and work together, enabling end-to-end automation for an entire team.
I'm really glad you brought that one up because I think that's actually one of the more common areas where we see a lot of the opportunity... a couple of agents that interface with each other and work with each other, and then maybe we're automating a process kind of end to end.
Anthropic's Legal Team Uses AI Agents for Marketing Copy Review
Anthropic's internal teams are leveraging AI agents to streamline various functions, including the legal review of marketing copy. Traditionally, this process involved marketers submitting a ticket for human legal review, but now an agent-powered application performs an initial assessment.
When a marketer submits copy, the agent first evaluates it against established legal guidelines. Depending on the content, the agent might instantly approve it, or it may flag the copy for further review by a human legal expert. This tiered system significantly speeds up the approval workflow.
For the agent to perform legal review effectively, it needs access to external context, such as specific legal rules and compliance requirements, integrated as "skills." The system also includes a collaborative interface, allowing different team members to interact with the agent and its outputs.
Crucially, this automated legal review always maintains a "human-in-the-loop" element. Full automation by a single AI skill is not feasible due to the necessity of human oversight, authentication, and nuanced judgment in legal matters. The process involves creating separate sessions for the agent's work and for human interaction, ensuring human experts remain involved in critical compliance decisions.
My job is to make sure that when marketing is writing something, they can get it approved really quickly by legal. And sometimes it'll approve things immediately, sometimes it sends stuff to legal, and ideally, it's like getting better all the time so it, it can do more and more.
Establishing Agent Ownership and Enabling Self-Service for Maintenance
Without clear human ownership, AI agents quickly become stale and ineffective, often leading to outdated prompts and unnecessary legal reviews. This highlights the critical need for a responsible party to maintain agent relevance and accuracy over time.
A compelling solution involves empowering end-users to self-serve modifications. For instance, teams using an agent might directly access the underlying code, such as Cloud Code, to implement desired changes themselves. This approach streamlines updates and fosters a sense of agency among users.
While self-service is beneficial, a technical review process, often by the core development team, is still crucial for system ownership. Furthermore, specialized "AI-pilled" technical personnel within a business remain vital for leveraging platforms to build truly excellent and sophisticated agents that perform optimally.
if you don't have a human who's responsible for the agent, it gets stale very quickly
Empowering Non-Technical Teams to Customize AI Agents
As AI agents become integrated into various departments, non-technical teams like legal often want to directly improve or modify them. This presents a challenge for infrastructure teams, who must enable these changes without being overwhelmed by potentially problematic pull requests or direct code manipulations. The goal is to empower domain experts while maintaining system stability.
One solution involves creating layers of abstraction. Instead of direct code access, users interact with their specific agent through a managed AI interface, such as a specialized instance of Claude. For example, a marketing team would interact directly with their marketing agent via this dedicated Claude.
This managed Claude agent acts as an intelligent intermediary. It interprets user requests and determines the correct way to implement changes, preventing users from directly manipulating core code or introducing complications. This approach effectively safeguards the system while allowing non-technical users to drive agent improvements.
By carefully tuning and prompting each layer of this managed agent, organizations can decentralize agent modification. This strategy allows subject matter experts to enhance agents directly within their domain, reducing the burden on central infrastructure teams and accelerating iteration.
Claude will oftentimes figure out what should be the right way for them to go and handle it, so that they're not kind of like, you know, hopping straight down to the absolute core bit and doing something that may result in, you know, some complication.
Exploring Multi-Agent Orchestration Strategies
Agent orchestration involves various advanced techniques to structure how AI agents work together. One interesting approach is the advisor strategy, which separates the execution of a task from the advice given to complete it. Another method involves adversarial pairs, where one agent generates content or solutions while another acts as a critic.
Further strategies include splitting tasks into many small pieces that recombine, resembling swarm intelligence, or employing best-of-N approaches where multiple solutions are generated and the best one selected. Each of these architectures or strategies is uniquely suited for specific use cases, with some being more effective for deep research and others for tasks like bug hunting.
The key insight is that by making agent primitives modular and 'Lego-like,' developers can combine them to create sophisticated architectures and strategies. This modularity allows for problem-solving at higher levels of abstraction, leading to more interesting and effective results. This capability suggests that improvements can be made across multiple layers of abstraction in AI system design.
if we can make the primitives very Lego-like, then people can put them together to solve things at a slightly higher form factor, which is more like an architecture or like a strategy, and they get much more like interesting results out of that
Measuring Agent Success by Outcome and Budget, and Managing Agent Lifecycles
Evaluating the success of AI agents should ideally simplify to measuring a verifiable outcome against a set budget. Instead of focusing on domain-specific evaluations, the goal is for agents to achieve a specific result within cost parameters. This approach moves towards a model where an agent could self-interpret and regrade its performance based on the desired end-state.
A significant challenge with AI agents is their rapid obsolescence. Agents can quickly become outdated due to reliance on older models, architectures, or a lack of human oversight in their maintenance. This necessitates a formal lifecycle management strategy, including a plan for decommissioning agents that are no longer relevant or effective.
To combat agents becoming outdated, a proactive solution involves creating dedicated 'skills' that allow agents to upgrade themselves. These skills can automate the process of moving an agent to a newer model or architecture, making it easier to keep the agent current and functional without constant manual intervention.
Claude, make me a billion dollars. Your budget is ten dollars.
Claude's Future Vision: Autonomous Agents and Massive Platform Scale
The vision for Claude in a year is a significant leap towards autonomy and simplicity. Users will primarily define outcomes and budgets, with Claude intelligently managing its own internal architecture, model selection, and sub-agent orchestration. This will substantially reduce the need for manual harness or prompt engineering, as Claude is expected to understand itself well enough to dynamically configure its processes, essentially "writing itself on the fly."
Crucially, achieving this future relies on the underlying platform's ability to scale dramatically. With agents constantly running, recreating themselves, and handling diverse, long-running requests, the system must accommodate immense and continuous demand. The aim is to ensure the platform's capacity never restricts what users can achieve with these advanced, self-optimizing AI agents.
I never want the ability of the platform itself to be able to scale to get in the way of what people would otherwise be able to accomplish with these things.
Follow the shows you care about.
Podbrew watches new episodes and turns them into concise briefs you can read in minutes.