In Agents We Trust

The Delegation Challenges of AI that Acts on Your Behalf

Jun 09, 2025

When I first encountered LLMs, I'll admit I was underwhelmed by their potential for automation. Having a model that could intelligently process and generate text was impressive. It opened doors for summarization, proofreading, and content creation that I hadn't even realized I needed. But automation? That seemed like a stretch. How could a text generator help me automate workflows when it couldn't even make an API call?

Then everything changed with the introduction of tool usage.

The breakthrough was deceptively simple: frameworks like LangChain enabled LLMs to express their intent to use tools through specially formatted text responses. The framework would intercept these requests and execute the actual tool calls, feeding results back to the model. Suddenly, LLMs weren't just text generators—they were orchestrators capable of interacting with any system we connected them to.

Today's tool usage has evolved far beyond those early text-parsing hacks. Through reinforcement learning, models now output structured responses that clearly indicate tool usage intentions. They can chain multiple tools seamlessly—searching the web for data, running calculations, and creating tasks in project management systems, all within a single interaction. This evolution has given birth to what we now call Agentic AI: AI that doesn't just advise, but acts.

The Oversight Challenge: When AI Becomes Your Deputy

When I grant an LLM access to tools, I'm essentially deputizing it to act on my behalf. Each tool represents a capability I'm entrusting to the AI's judgment.

Consider the spectrum of authority we might grant:

Low stakes: A web search tool that's read-only and has no lasting effects
Medium stakes: A calendar tool that can create and modify meetings on your behalf
High stakes: A production database tool with write access to customer data

The spectrum illustrates a fundamental challenge: as we grant AI more authority, the consequences of its actions become more serious. Should it really modify production data without checking first? Can it reschedule that critical client meeting on its own?

To truly unlock automation’s potential, we need approaches that don’t require humans to approve every action.

This brings us to a key consideration: how do we ensure AI stays within appropriate bounds? In the near term, human oversight remains our best answer. Chatbot applications like Claude Desktop implement this through permission requests, though they haven't quite nailed the balance yet—too many prompts create fatigue, too few create risk. I'm confident developers will refine this over time.

But even perfect permission flows only get us so far. To truly unlock automation's potential, we need approaches that don't require humans to approve every action. Multi-agent architectures offer one promising direction: imagine a researcher agent that determines what actions to take, an implementer agent that executes them, and a supervisor agent that performs risk assessments and grants approvals. This separation of concerns could provide the oversight we need without the constant human interruptions. Until these architectures mature, though, we're left managing the permission balance ourselves.

The MCP Complication: When Your Deputy Becomes Confused

Just when you thought managing AI delegation was complex enough, enter the Model Context Protocol (MCP). This protocol promises to solve a real problem: why should everyone build their own integration to Slack, or JIRA, or Gmail? Why not use pre-built MCP servers that handle these connections for us?

It's a compelling proposition with a hidden cost: you're now trusting an intermediary between your AI and the services it needs to access.

Now, intermediaries aren't inherently bad—security professionals often deploy them intentionally for observability and policy enforcement, like firewalls or API gateways. But MCP servers are a different breed. They're typically authored by third parties and may even be hosted on someone else's infrastructure, completely outside your network. You're not just adding a technical layer; you're introducing a new party into your trust relationships.

The very design that makes MCP useful also makes it risky.

This expanded trust becomes particularly concerning given MCP's architecture. As it’s designed today, the protocol makes MCP servers responsible for managing authentication tokens for resource servers. When you authorize an MCP server, the OAuth flow directs tokens back to that server, not your client. While the protocol doesn't dictate exactly how these tokens are stored—a server could use secrets management or other secure storage—the server still ultimately controls access to your credentials. This design decision has two major implications:

First, it turns MCP servers into high-value targets. They're not just routing requests—they're holding the keys to multiple users' accounts across various services. A single compromised MCP server could expose credentials for numerous users and systems.

Second, it creates what security experts call a "confused deputy" problem—and this isn't theoretical. Researchers have demonstrated that MCP servers are vulnerable to an attack where a well-crafted link can trick a victim into authorizing a malicious client. The attacker can then use the victim's authorization to access the MCP server on their behalf. Here's where our deputy metaphor becomes painfully concrete: the confused deputy (our MCP server) can be manipulated into misusing its authority, granting attackers access they should never have.

We're left with an uncomfortable dilemma: we want the benefits of MCP's decoupled architecture—the convenience of pre-built integrations and isolated logic—but this same architecture introduces security risks that are difficult to mitigate. The very design that makes MCP useful also makes it risky.

Trust: The Critical Factor for Enterprise Agent Adoption

Trust emerges as the critical security consideration that will determine how organizations can design and meaningfully adopt AI agents. This trust operates on multiple levels, each presenting its own challenges.

First, I need confidence that my agent will make appropriate choices—both in selecting which tools to use and in providing the right inputs to those tools. This requires thoughtfully incorporating human-in-the-loop feedback mechanisms that provide oversight without becoming tedious. We're still searching for that sweet spot where humans stay informed and in control without drowning in approval requests.

Second, I need to trust all the parties involved in my tool usage framework. As we've seen with MCP, architectural elegance can come at the cost of expanded trust relationships. Each new intermediary, each third-party integration, each external server introduces another potential point of failure. Organizations must carefully weigh whether the convenience of pre-built integrations justifies the additional security surface area.

I believe AI agents are ready for thoughtful adoption by large enterprises today. However, the viable use cases will directly correlate with your organization's appetite for these trust-based risks. Low-stakes automation with read-only tools? That's ready now. High-stakes operations involving production systems and sensitive data? Those require more careful consideration of the trust relationships you're creating.

The path forward isn't about waiting for perfect security—it's about understanding the trust implications of our architectural choices and making informed decisions about which risks we're willing to accept in exchange for the transformative potential of AI agents.

Prompt & Proper

Discussion about this post