Inside Stainless: The Developer Tools Startup Anthropic Just Bought for $300 Million

This episode features Alex Rattray, founder and CEO of Stainless, the developer tools startup recently acquired by Anthropic for $300 million. Alex and host Dan Shipper, a small investor in Stainless, unpack the complex world of making software communicate effectively, especially in the context of emerging AI. They delve into the Model Context Protocol (MCP) and its pivotal role in the AI-native internet.

Stainless builds APIs, SDKs, and MCP servers for major companies like OpenAI and Anthropic. The discussion centers on solving the critical challenge of connecting large language models (LLMs) with traditional computer systems. This connection is essential for the future of agentic AI, which aims to automate intricate, multi-step tasks across various applications without overwhelming the LLM.

Alex shares his expertise on designing robust and efficient MCP servers, detailing the principles required to get them right. They explore the security implications, the shift towards AI models executing code, and Alex's vision for a "cyborg" future where neural networks seamlessly integrate with traditional code execution. This conversation offers crucial insights into building the foundational infrastructure for the next generation of AI-powered software.

Key takeaways

APIs are analogous to dendrites in the brain, which are critical for neurons to connect and enable thought, highlighting their fundamental role in internet functionality.
LLMs represent a new class of computing system requiring a dedicated method to interact with traditional computers.
The Model Context Protocol (MCP) is an emerging standard designed to bridge LLMs with other computer systems, a key investment area for Stainless.
Agentic AI aims to enable LLMs to automate intricate, multi-step human tasks across various software applications, such as processing refunds and sending follow-up communications.
While SDKs offer simplified interfaces for human developers to interact with APIs, MCPs are envisioned to provide a similar native and easy-to-use interface for large language models.
Integrating all tools from a single large application like Stripe can overwhelm an LLM's context window and confuse the model, posing a major scaling challenge for agentic AI.
Designing effective API interfaces for LLMs is a new, complex research area, more challenging than creating SDKs for human developers because we cannot easily understand or predict an LLM's operational 'thinking.'
Tools within an MCP server should be limited in number, precisely named, have minimal input parameters, and return only the essential data needed by the model.
Stainless's MCP uses a "dynamic mode" with three generic tools (list, get, execute) to scale API exposure efficiently for large APIs, balancing context with tool count.
Dynamic mode simplifies API tool exposure but introduces latency and cost due to requiring multiple model turns for a single action.
AI tools can iteratively generate and refine complex SQL queries by incorporating detailed business context and specific filtering criteria.
Alex Rattray predicts the future of AI will be 'cyborgs,' combining neural network capabilities with traditional code execution.
AI models will transition from managing many specific tools to using a general code execution tool, allowing them to write and run code directly via an API's SDK.
Executing API calls via AI-generated code drastically reduces context window usage, even for complex, multi-step data retrieval.
This method enables rapid task execution as the code runs directly on a server, eliminating frequent round trips to the AI model.
API providers executing AI-generated code significantly improves reliability by ensuring correct library versions and handling potential AI hallucinations.
Implement AI system security at the API layer itself, as limiting exposure through a user interface is inadequate.
Use OAuth with granular permissions and scopes to ensure proper security at the API layer for AI interactions.
Code execution tools can validate API requests from LLMs, correcting invalid calls and offering valid alternatives with relevant documentation.
Prioritizing speed and openness in AI tool deployment can create market leadership and drive an entire technological wave, as seen with Stable Diffusion's impact on image generation.

04:49 - 06:00

APIs Explained: The Dendrites of the Internet

An API, or Application Programming Interface, serves as the fundamental mechanism for different computer programs to communicate with each other. It's essentially how one software application talks to another, allowing for data exchange and functional interaction.

Alex Rattray uses a biological analogy, describing APIs as the "dendrites of the internet." In the brain, dendrites are where neurons connect and exchange information, enabling thought. Similarly, on the internet, if servers weren't communicating through APIs, the entire network wouldn't function.

This constant communication between programs via APIs is essential for the internet's operation, forming the connective tissue that allows various applications and services to work together seamlessly and automate processes.

APIs are the dendrites of the internet. Dendrites are where your neurons connect and actually exchange information with each other.

06:00 - 08:00

Model Context Protocol enables LLMs to interact with computers

APIs are foundational to modern software, enabling all digital programs to connect and interact, much like dendrites in a brain. They drive automation and facilitate most business-to-business interactions, making systems more efficient across the board.

Historically, interactions involved either humans using user interfaces or computers communicating via APIs. The advent of AI introduces large language models (LLMs) as a new type of 'computer' that also needs to connect and operate with existing systems.

The Model Context Protocol (MCP) is being developed as the essential interface for connecting LLMs to computers. This protocol allows AI systems to integrate into the broader digital ecosystem, much like SDKs empower developers, and is an area Stainless is actively investing in.

a new computer has entered the chat, right? There's a new, there's a new kind of system that can talk to other systems, or at least we would like it to be able to.

08:00 - 12:00

Agentic AI's Vision Struggles with Current MCP Limitations

The grand vision for agentic AI involves enabling large language models to automate complex, multi-application tasks that typically require human interaction with various software interfaces. This includes common business processes like processing a customer refund and sending a personalized discount code with a thank you note, which often spans multiple internal systems.

Historically, Software Development Kits (SDKs) have simplified how human developers interact with APIs, such as using `pip install stripe` to easily create customer objects or charge credit cards. The intention for MCPs (Model-Computer Programming, or similar concept for LLMs) was to provide a similar native interface, allowing LLMs an easy way to interact with APIs and applications.

However, current implementations of MCPs are falling short of this ambitious vision. While a user interface allows humans to click buttons, fill out forms, and navigate extensively, LLMs interacting through MCPs tend to be much more restricted. Only a few tools and limited functionality are typically exposed to the models.

This limited tool exposure and functionality significantly hinders the ability of LLMs to perform the broad range of actions necessary to automate the comprehensive, multi-step tasks central to the promise of agentic AI, such as a complete customer service interaction across different platforms.

but LLMs interacting with through MCP, it tends to be much more restricted. You can only do a few little things. There's usually not a ton of tools that you're going to be exposing to the models.

12:00 - 16:01

LLMs Struggle with Context Limits When Integrating Extensive API Toolsets

Current business workflows often involve navigating multiple SaaS applications and performing numerous clicks to complete a single task, such as processing a refund in Stripe, sending an email, and updating Salesforce. This common practice is inefficient and time-consuming for human operators.

Agentic AI promises to streamline these processes by allowing users to simply prompt an LLM to automate complex tasks across various applications. The AI would then navigate different interfaces and execute actions independently, eliminating manual steps.

However, realizing this potential faces a significant technical hurdle: to function effectively, the LLM needs comprehensive access to all possible actions within each application. For instance, the Stripe API alone contains hundreds of endpoints, representing a massive array of potential tools.

Supplying an LLM with the definitions for all these tools, such as the Stripe Open API specification, can exhaust its context window, potentially consuming hundreds of thousands of tokens. Current LLMs not only struggle to manage such vast amounts of information but also find it confusing, limiting their ability to scale for general enterprise use across numerous SaaS tools.

You've just burned through your entire context budget.

16:01 - 20:02

Handcrafting Specialized Tools is Essential for LLM Integration

The grand vision for AI operators involves making every business SaaS tool, with all its specific functionalities and corner cases, directly available within their AI chat interfaces. However, achieving this ideal presents significant hurdles, extending beyond just context window limitations to include serious security and permissions challenges.

Currently, successful integration of Large Language Models (LLMs) with external systems demands a 'handcrafting' approach. Instead of simply exposing raw REST API endpoints in a one-to-one mapping, developers are finding it necessary to build specialized tools, meticulously designed with the LLM's ergonomics and thought processes in mind.

This problem is compared to the long journey of creating effective Python SDKs for human developers, but for LLMs, it's a fundamentally new research area. Unlike human developers who can learn API nuances, it's not yet possible to 'think or see like an LLM' to design truly ergonomic interfaces for them.

LLMs face difficulties with sustained, multi-step chains of actions and managing large volumes of data, such as paginated API responses. Sifting through extensive datasets to locate a specific piece of information, like a single transaction, quickly exhausts their context, analogous to finding a few small needles in an overwhelming haystack.

we haven't figured out how to expose an API ergonomically to an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer.

20:02 - 24:02

Best Practices for Designing Robust MCP Servers

Building effective Multi-Client-Platform (MCP) servers requires significant product management and engineering effort. Developers must engage with customers to understand their needs, observe their use of current software, and identify opportunities for AI to unlock new capabilities. This includes substantial engineering to integrate AI components, establishing robust evaluation systems, and considering the diverse matrix of clients (e.g., Cursor, Claude Code) and underlying models that the server needs to support.

A critical challenge in MCP server development is establishing effective feedback mechanisms. It's often difficult to determine if a tool call response or answer provided by the server was actually useful to the user or successfully utilized by the LLM. Implementing a first-class feedback system, such as a 'send feedback' tool, can help the MCP server learn from user interactions, even if the feedback is negative.

When designing the tools for an MCP server, several principles are key. Keep the total number of tools relatively small. Ensure each tool's name and description are precise and specific to avoid ambiguity. The input schema for each tool should have a small number of concisely described parameters, making it easier for models to understand and use. Furthermore, the response data from tools should be minimal, containing only the exact information the model requires, which can sometimes be achieved using techniques like a JQ filter to process JSON outputs.

You want the response data to come back with a, a very small amount of data, only, only exactly what the model will need.

24:02 - 26:02

Stainless utilizes a dynamic mode for its Model Control Plane to scale API exposure and for internal business intelligence.

Stainless employs a "dynamic mode" within its Model Control Plane (MCP) to effectively manage API exposure at scale. This mode simplifies the interface by offering three general tools—listing endpoints, getting details about an endpoint, and executing an endpoint—rather than exposing every single endpoint as a distinct tool. This approach allows the system to scale efficiently, regardless of how extensive the underlying API becomes.

While dynamic mode greatly aids in scaling and maintaining contextual understanding, it comes with specific trade-offs. Executing a single action requires three distinct turns of the model, which can lead to increased processing time and higher computational costs. Additionally, this method might introduce some slight lossiness compared to directly presenting each endpoint as a unique, specific tool.

Internally, Alex Rattray leverages MCP servers for critical business operations, integrating them with various SaaS applications such as Notion, HubSpot, and Gong. This setup also includes a read-only connection to their PostgreSQL database, centralizing diverse data sources.

For example, Rattray uses this integrated MCP system to query for "interesting customers that signed up for Stainless last week." The system then processes and synthesizes information from their database, Notion notes, and Gong transcripts to provide a comprehensive answer, showcasing its powerful capability for cross-application business intelligence.

and so that enables this context thing to scale really well, but it means there's three turns of the model just to do one thing.

28:02 - 32:03

Leveraging AI for knowledge management and SQL generation

Alex Rattray utilizes Claude Code to collect and cache interesting customer quotes and research into a specialized Git repository. The AI is instructed to find relevant quotes and store them as markdown files with full citations, creating a readily accessible, internal knowledge base.

While this repository is initially unstructured, Claude Code effectively handles the lack of rigid organization. The system is currently used by a small team, including Alex, a business person, and a couple of customer support engineers, with plans for broader adoption and future structuring as its usage grows.

In addition to customer insights, Alex uses Claude Code to generate and refine SQL queries for business metrics. He engages in an iterative process, providing specific business context and filtering requirements to Claude Code, which then refines the query for a more accurate analysis or report.

Once a SQL query is perfected through this AI-assisted iteration, it is saved to an analytics folder within the Git repository. This practice ensures that complex queries can be easily retrieved and reused for subsequent analysis, such as preparing for board meetings.

And I kind of imbued more of this business context into that SQL query, and I iterated with, with Claude code, to get it to be better and better for the specific kind of metric that I was looking for, the specific kind of story that I was trying to tell.

32:03 - 34:05

Stainless Explores AI for Customer Support Bug Resolution

Stainless is experimenting with using AI, specifically Claude Code, to address customer support bugs. The company faces a common challenge where highly technical support engineers, despite having the skills to fix bugs, often lack the dedicated time to do so. Constant incoming customer tickets pull them away from deep debugging, making it difficult to fully resolve issues themselves.

The approach involves having Claude Code attempt to fix bug tickets that come in. While this is still an experimental phase and not yet successful even 50% of the time, it shows considerable promise. The goal is not perfection, but rather to improve the overall efficiency of the support workflow.

By offloading initial bug resolution attempts to AI, Stainless hopes to free up their skilled human engineers. This allows the technical support team to allocate their valuable time to other urgent tasks and more complex problems, ultimately streamlining their customer service operations.

They have the technical skill, but guess what? Another customer writes in two minutes later, and they wanna jump on that. Don't wanna be knee deep in a debugger.

34:05 - 36:05

Alex Rattray envisions a 'cyborg' future where AI executes code directly

Alex Rattray proposes that the future of AI will be characterized by 'cyborgs,' a blend of neural networks like GPT and traditional CPU-based code. This means interactions with an AI agent will involve a system that is part advanced AI and part conventional software, representing a fundamental shift in how AI operates.

This 'cyborg' approach is anticipated to manifest in two key areas: handling specific, one-off operational tasks that require precise actions, and contributing to the development of production software. This duality suggests a versatile application across different scales of software needs.

Instead of AI models requiring a vast array of specialized tools, Rattray foresees a more streamlined system where models primarily use a single code execution tool. This tool would empower the AI to directly write and execute code, such as TypeScript against an API's SDK, to perform complex operations like listing Stripe transactions, simplifying the integration and action capabilities of AI.

The future of AI is cyborgs.

36:05 - 38:05

AI Code Execution Reduces Context Window Impact and Accelerates Tasks

AI models demonstrate high proficiency in writing code for interacting with APIs through SDKs, such as `stripe.charges.list` or `stripe.customers.retrieve`. When provided with a basic `readme` that includes example requests and available API calls, models effectively extrapolate patterns, especially if the SDK and API are well-structured and predictable.

This approach significantly minimizes the impact on the context window. Initial context might be as low as a thousand tokens, and subsequent complex operations, like paginated list requests, have virtually no additional context cost. The AI model executes code to perform tasks like searching for a customer named Dan and verifying a purchase, generating only a minimal output (e.g., ten lines of text) that returns to the main context.

A key advantage is the speed of execution. The code runs rapidly on a server, often co-located with the API (like Stripe's API in AWS), avoiding the latency of round trips to the AI model for each step. This allows for swift processing and completion of tasks.

the context impact of doing a whole bunch of paginated list requests Zero.

38:05 - 40:06

Provider-Executed AI Code Ensures Reliable API Interactions

When a language model interacts with an API, it can generate code to perform specific actions. A key distinction arises in who executes this code: either the API provider or the user's own system. In the provider-executed model, the language model writes the API-specific code, which is then sent to the API provider's server for execution against their API, and the results are returned.

Executing AI-generated code directly on the user's side presents significant challenges, particularly with external libraries. Large language models (LLMs) often struggle with correctly identifying and using the right version of a library. They can also hallucinate aspects of an API or fail to gracefully recover from errors, leading to unreliable or incorrect interactions.

This unreliability can force developers to bypass convenience libraries and resort to hitting raw HTTP APIs. This approach necessitates consulting extensive OpenAPI specifications, which are often massive and cumbersome to navigate. Provider-executed code circumvents these issues by leveraging the API provider's direct knowledge of their system, libraries, and types.

The reliability of API interactions is further enhanced with typed libraries and static typing. This ensures that the generated code conforms to the expected data structures and function signatures, catching potential errors before execution and making the entire process more robust.

Model writes API code and API provider executes that code, runs it on their API and returns the results.

40:05 - 42:06

Secure AI Interactions at the API Layer with Granular Permissions

A "code execution tool" can validate AI-generated API requests, flagging invalid calls like "stripe.transactions.list" if that API does not exist. It can suggest valid alternatives such as "payment intents" or "balance transactions" and provide inline documentation, potentially using its own AI.

True security for AI interactions must be implemented directly at the API layer. Attempting to secure systems by merely limiting exposure through a user interface or "MCP" is insufficient, as the underlying API remains accessible and vulnerable.

The proper method for securing AI systems involves using OAuth with granular permissions and specific scopes at the API layer. While building robust OAuth scopes can be challenging, this approach ensures security is enforced at the correct point within the system.

What people should be doing is using OAuth with granular permissions, with, with, with, with proper scopes, and at that point, the security happens the right place, which is at the API layer.

42:06 - 44:08

Stainless is Developing a Secure Computer Use Tool for Multi-API AI Environments

Stainless is actively developing a "computer use tool" designed for AI models. This tool aims to provide an environment where AI can write and execute code to interact with various APIs.

Unlike existing solutions, such as OpenAI's, Stainless's tool will allow for custom environments and the installation of different libraries. This flexibility is crucial for models that need to call a wide range of APIs and require network access to do so.

The initial focus is on ensuring security by starting with single-API provider use cases. This approach ensures that the code execution sandbox prevents network connections to anything other than the intended API, like api.stripe.com. This strict control is deemed critical for maintaining security.

The long-term vision is to expand this capability to allow models to securely hit multiple integrations, such as Stripe and Salesforce, while maintaining tight security protocols.

One of the advantages of starting with just one API provider is that you ensure there's no network connections allowed out of that sandbox where we're running the code to anything other than, in this case, api.stripe.com, and that's really, really critical for security for something like this.

44:08 - 46:09

Automating Enduringly Useful AI Actions from Exploration to Production

Many AI interactions start as one-off, exploratory actions, often within a chat interface. However, what begins as a singular task can quickly reveal an enduringly useful pattern that warrants automation. This mirrors traditional software development, where repetitive manual tasks are eventually codified and automated.

The goal is to transition these valuable AI-driven actions from their exploratory phase into robust, production-ready software. For example, if an AI consistently handles customer support requests like automatically refunding for defective socks, that process should become an automated rule rather than a repeated manual prompt.

AI models capable of interacting with APIs in code execution sandboxes should also be able to identify and commit

enduringly useful code

chat is a really good interface for exploring, but sometimes you just want a dashboard.

46:08 - 48:09

Less cautious "YOLO" approaches accelerate AI tool adoption

The adoption of new AI tools is often accelerated by less cautious, "YOLO" (you only live once) release strategies rather than highly conservative, private methods. While large enterprise customers prioritize caution, individual developers and early adopters are often more willing to experiment with powerful tools that offer broad access and functionality.

A prime example is the contrasting release of DALL-E and Stable Diffusion. DALL-E remained private for a significant period, limiting access to a select few. Stable Diffusion, however, took an open approach, allowing anyone to use its image generation capabilities, which rapidly ignited the entire image generation wave, despite the project's later challenges.

Similarly, in code generation, Claude Code adopted a "YOLO" mode, providing a robust sandbox with options to dangerously skip permissions, allowing for industrious use by individual developers. In contrast, OpenAI's Codex CLI initially lagged, being built for more cautious pair programming environments and limiting its ability to perform extensive tasks, even in full auto mode, before it eventually caught up by offering more permissive options.

This "YOLO" approach matters because individual developers building at a smaller scale are typically the earliest and most impactful adopters of AI-first technologies. By making powerful tools readily available to this segment, even if it means slightly less initial focus on enterprise-level security, companies can drive widespread adoption and innovation that eventually influences larger organizations.

The things that get adopted are often the ones that are willing to take the risk to be YOLO very early.

48:08 - 50:09

Applying a code execution model shifts email assistant tool creation to prompt engineering

Email assistants like Cora, which leverage the Gmail API for functions such as archiving or drafting emails, could significantly enhance their flexibility with a code execution model. This approach would simplify the creation of new tools and ensure they integrate seamlessly without breaking existing functionalities.

The envisioned future suggests that with a code execution-based "super tool," the process of "building tools" would transform into primarily prompt engineering. This means the full power of an API, like the Gmail API, would be consolidated and accessible through a single, comprehensive tool.

Developers would then define specific tasks or categories of work using prompts, guiding the LLM to execute action sequences productively. While prompt engineering can be tricky, it would become the primary engineering focus for expanding the assistant's capabilities.

the only way you really quote-unquote build a tool is with Instructions with prompts

Follow the shows you care about.

Podbrew watches new episodes and turns them into concise briefs you can read in minutes.

Get your own briefs