MCP Architecture Patterns: Building Robust AI Integrations
Deep dive into Model Context Protocol architecture. Learn dual-vault patterns, tool design best practices, and real implementation examples from production MCP servers handling 26+ tools.
Deep dive into Model Context Protocol architecture. Learn dual-vault patterns, tool design best practices, and real implementation examples from production MCP servers handling 26+ tools.
Model Context Protocol (MCP) is becoming the industry standard for extending AI capabilities. After building 12 production MCP servers (including a 26-tool YouTube server and dual-vault Obsidian integration), I've learned what works and what doesn't. Here's the real-world architecture guide.
MCP is an open protocol that lets AI assistants connect to external data sources and tools. Think of it as an API standard specifically designed for AI context—not for humans calling endpoints.
The protocol handles:
One of my most useful patterns: connecting AI assistants to two separate Obsidian vaults simultaneously. This solves the problem of mixing personal and work contexts.
┌──────────────────────────────────────────┐
│ Claude Desktop │
│ (or Claude Code) │
└──────────────────────────────────────────┘
│
┌───────┴───────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ Personal│ │ Work │
│ MCP │ │ MCP │
│ Port │ │ Port │
│ 22360 │ │ 22361 │
└─────────┘ └─────────┘
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│Personal │ │ Work │
│ Vault │ │ Vault │
│ │ │ │
│Business │ │Property │
│Projects │ │ Mgmt │
└─────────┘ └─────────┘
{
"mcpServers": {
"obsidian-personal": {
"command": "npx",
"args": ["mcp-remote", "http://localhost:22360/sse"],
"env": {}
},
"obsidian-work": {
"command": "npx",
"args": ["mcp-remote", "http://localhost:22361/sse"],
"env": {}
}
}
}
Why This Works
Query: "What tasks do I have this week?"
The AI assistant searches both vaults, aggregates tasks, and presents them separated by context. It knows which vault is for personal vs work based on server names.
Query: "Update the work property file for Highland Terrace"
Your AI assistant knows to use obsidian-work__append_content not personal vault.
Clear separation prevents cross-contamination.
Always return lightweight context by default, with options for more detail:
@server.tool()
async def get_video_context(
video_id: str,
include_transcript: bool = False, # Optional
include_comments: bool = False, # Optional
include_transcript_status: bool = False # Optional
):
"""Get video metadata (lightweight by default)."""
# Always return: title, description, stats, chapters
context = await fetch_metadata(video_id) # ~2-3KB
# Optionally add heavy data
if include_transcript:
context['transcript'] = await fetch_transcript(video_id) # Could be 100KB+
if include_comments:
context['comments'] = await fetch_comments(video_id) # Could be 50KB+
return context
Why This Matters
AI context windows are large but not infinite. Tools that return 100KB+ of data by default consume the context budget quickly. Start small, let the AI request more if needed.
For large content, return indexes/summaries first with chunking options:
@server.tool()
async def get_video_transcript(
video_id: str,
chunk_size: Optional[int] = None, # If set, return chunks
chunk_index: Optional[int] = 0, # Which chunk
time_range: Optional[List[str]] = None # Or get specific time range
):
"""Get transcript with intelligent size handling."""
transcript = await fetch_transcript(video_id)
# Long video? Return index/summary by default
if len(transcript) > 50000 and chunk_size is None:
return {
"type": "index",
"summary": generate_summary(transcript),
"duration": get_duration(transcript),
"preview": transcript[:500], # First 500 chars
"recommendations": [
"Use chunk_size=100 to get full transcript in chunks",
"Use time_range=['00:10:00', '00:20:00'] for specific section"
]
}
# Handle chunking or time ranges
if chunk_size:
return get_chunk(transcript, chunk_size, chunk_index)
elif time_range:
return get_time_range(transcript, time_range)
# Short video: return full transcript
return transcript
For APIs with daily quotas, implement automatic account rotation:
class AccountManager:
async def get_youtube_client(self, preferred_account: Optional[str] = None):
"""Get YouTube client with automatic quota balancing."""
if preferred_account:
return await self._get_client(preferred_account)
# Round-robin through accounts
for account in self.accounts:
quota_used = await self._check_quota(account)
if quota_used < 9000: # Under daily limit of 10,000
return await self._get_client(account)
# All accounts exhausted
raise QuotaExceededError("All accounts at quota limit")
async def _get_client(self, account: str):
"""Get authenticated client with token refresh."""
credentials = await self._load_credentials(account)
if credentials.expired:
credentials.refresh()
await self._save_credentials(account, credentials)
return youtube_client(credentials)
Decision 1: Metadata-First Approach
get_video_context returns ~2-3KB by default (title, description, stats).
This lets AI assistants decide if they need transcript/comments before requesting heavy data.
Decision 2: Search Across Accounts
search_all_accounts tool searches multiple Google accounts in parallel,
deduplicates results, and tracks which accounts found each video. Perfect for quota balancing.
Decision 3: Code Block Detection
search_transcript can detect code blocks in transcripts (Python, JS, TypeScript, etc.)
and extract them with context. Great for learning from technical tutorials.
Query: "Find authentication examples in this tutorial"
AI workflow:
get_video_context → Confirms video exists, checks durationcheck_transcript_availability → Verifies captions existsearch_transcript(search_query="authentication", search_type="code")Total quota used: 1 unit (metadata check) + 0 units (transcript is free with OAuth)
Decision 1: Obsidian Plugin as MCP Server
Instead of building a standalone server that watches files, the MCP server runs inside Obsidian as a plugin. This gives real-time access to Obsidian's API and metadata.
Decision 2: Port-Based Vault Separation
Each vault runs on a different port (22360, 22361). Claude Desktop connects to both via
mcp-remote bridge. Tools are automatically namespaced by server name.
Decision 3: Markdown-Aware Operations
Tools understand Obsidian conventions: frontmatter, internal links [[like this]],
tags #like-this, and block references ^like-this.
Query: "Create a meeting note for today's board meeting at Highland Terrace"
AI workflow:
obsidian-work__get_file_contents to read property fileobsidian-work__append_contentNote created in correct vault with proper context and links
@server.tool()
async def get_video_transcript(video_id: str):
try:
transcript = await fetch_transcript(video_id)
return transcript
except TranscriptNotAvailable:
return {
"error": "No captions available",
"suggestions": [
"Check if video has auto-captions",
"Try another video from same channel"
]
}
except QuotaExceeded:
return {
"error": "API quota exceeded",
"suggestions": [
"Wait until midnight Pacific for reset",
"Add another Google account"
]
}
AI assistants use descriptions to decide which tool to call. Be specific:
"Get video data"
"Get video metadata (title,
description, stats, chapters).
Lightweight by default (~2-3KB).
Use include_transcript for
full captions."
Cache expensive operations with TTL:
@cache(ttl=3600) # 1 hour
async def get_video_metadata(video_id: str):
# Expensive API call
return await youtube_api.videos().list(id=video_id)
YouTube MCP achieves 90% quota reduction through caching.
from asyncio import Semaphore
class RateLimiter:
def __init__(self, max_concurrent=5):
self.semaphore = Semaphore(max_concurrent)
async def __aenter__(self):
await self.semaphore.acquire()
async def __aexit__(self, *args):
self.semaphore.release()
rate_limiter = RateLimiter(max_concurrent=5)
@server.tool()
async def search_videos(query: str):
async with rate_limiter:
return await youtube_api.search(q=query)
Problem: Tool returns 100KB transcript when AI only needed video title.
Solution: Lightweight context first, optional parameters for heavy data.
Problem: 3-hour video transcript overwhelms context window.
Solution: Return summary/index by default, offer chunking or time-range options.
Problem: "Error 403" with no context.
Solution: Return actionable error messages with suggestions.
Problem: Hit API quota limit at 10am, server unusable rest of day.
Solution: Multi-account rotation, caching, quota tracking.
Problem: Synchronous API calls block entire server.
Solution: Use async/await throughout, connection pooling for HTTP clients.
Systems Architect with 12 production MCP servers including YouTube MCP (26 tools, 356 tests) and dual-vault Obsidian integration. Specializing in Model Context Protocol architecture and production AI systems.
I help teams design and build production-ready MCP servers with proper authentication, caching, and error handling.
Schedule a Consultation