What if a customer could complete a loan application, an onboarding flow, or a purchase just by talking to you inside a chat? No downloads. No app installs. No friction.
That's not a hypothetical future. It's what happens when you combine Webfuse's zero-code co-browsing with Chrome's WebMCP protocol and a conversational voice agent, all running in the browser.
TL;DR
The browser is the platform. Around 90% of customer journeys happen on the web, including inside messaging-app in-app browsers.
WebMCP replaces screenshots and DOM scraping. Websites expose structured tools to AI agents through navigator.modelContext, so the agent calls actions instead of guessing pixels.
Webfuse makes any site agent-ready in one click. A proxy-based co-browsing layer works on any website, with no SDK, no app store review, and no native code.
The voice layer is pluggable. Ship today with ElevenLabs, or go fully Google-native with ADK + Gemini Audio.
The result is days, not months. What used to need SDK integrations, screenshot agents, and bespoke tool servers is now a composable web-native stack.
Let me show you.
Try It Live
Interact with the demo above, or open it in a new tab.
The Problem: Agents Are Still Pretending to Be Humans
If you've watched an AI agent "use" a website, you know the absurdity:
- Taking screenshots of pages and burning through thousands of tokens to guess which button is which
- Scraping raw HTML, hoping nothing changed since the last crawl
- Clicking around until something works
This is billion-parameter models pretending to be humans, pixel by pixel. It's fragile, expensive, and slow. A single form fill that takes a human ten seconds might require dozens of sequential agent interactions, each one an inference call that adds latency and cost.
Web UI is designed for humans. AI agents need structure.
The Solution: WebMCP Gives Agents Structured Access to the Web
Enter WebMCP (Web Model Context Protocol), a browser-native standard from Google that lets websites expose structured, callable tools directly to AI agents through navigator.modelContext.
Instead of an agent guessing which blue rectangle is the "Submit" button, the website publishes a contract:
// Instead of: "find submit button and click it"
// With WebMCP, the agent calls:
submit_form({ data })
The agent knows exactly what actions are available, what parameters they accept, and what results they return. No screenshots. No DOM scraping. No guessing.
The difference is stark: WebMCP transforms every website into a structured tool for AI agents, turning the browser from a passive display into an interactive platform.
Make Any Website Agent-Ready
Webfuse turns any website into a shared, agent-controllable surface with zero code. Pair it with WebMCP and a voice agent to ship agentic customer journeys in days, not months, on the platform every customer already has - the browser.
How It Works in Practice
Layer 1: Webfuse - Instant Co-Browsing, Zero Code
Webfuse is a proxy-based co-browsing platform. It works on any website, including third-party sites, without requiring SDK installation, code changes, or app wrapper integration.
When a customer taps a button inside their messaging app, the session opens in the in-app browser. Both the agent and the user see the same page simultaneously. The agent can navigate, highlight, and guide. The user can interact freely.
No join codes. No downloads. No QR codes. One tap.
Layer 2: WebMCP - Structured Agent-to-Web Interaction
While the user and agent share the viewport, WebMCP gives the agent structured access to page actions. The agent doesn't need to "see" the page visually to know what to do; it calls the tools the website exposes.
This is the critical shift: the agent acts on the page while the user watches and guides via voice.
Layer 3: Voice - Natural, Conversational Control
The voice interface makes the interaction feel natural. The customer speaks. The agent responds. And while they're having this conversation, the agent is actively navigating the page, filling forms, clicking buttons, guiding the user through the journey.
And here's the key: the voice layer is pluggable.
You can use ElevenLabs for production-ready voice today. Or you can swap in Google ADK + Gemini Audio for a fully Google-native stack:
- Google ADK provides bidirectional voice streaming with Gemini-powered reasoning
- WebMCP tools integrate directly into ADK agents, no bridging layer needed
- Gemini multimodal audio handles both speech synthesis and understanding in one model
The narrative becomes clear: Webfuse on the web layer, ADK + WebMCP on the agent layer, Gemini for voice, all Google infrastructure.
Why Web-First Wins
You might be wondering: why not use native app co-browsing?
Because 90% of customer journeys happen on the web. Even inside messaging apps, the in-app browser is your canvas. WebMCP makes that canvas agent-ready. You don't need a native SDK to guide a customer through a form, a purchase, or an onboarding flow.
The browser is the universal runtime. And with WebMCP, it's also the most agent-ready runtime.
For the handful of cases where you absolutely need native UI access, SDK-based tools fill that gap. But for the vast majority of customer interactions, forms, transactions, onboarding, guided sales, web is not "good enough." It's the right platform.
Speed-to-Market Is the Whole Point
The demo you just saw above took days to put together, not weeks. Here's why:
| Layer | Traditional Approach | WebMCP + Webfuse Approach |
|---|---|---|
| Co-Browsing | SDK integration, app store review, native code | Zero-code proxy, works instantly on any website |
| Agent Interaction | Screenshot-based parsing, DOM scraping, custom tool servers | WebMCP, structured tool registration in the browser |
| Voice Interface | Third-party API integration | Pluggable: ElevenLabs or Google ADK + Gemini |
| Total Time | Weeks to months | Days |
The integration point is the agent, and with WebMCP, that integration is structured tool registration, not DOM parsing and screenshot analysis.
The Stack
| Layer | Technology | Role |
|---|---|---|
| Co-Browsing | Webfuse / Surfly | Proxy-based co-browsing, zero SDK, works on any website |
| Agent Protocol | Chrome WebMCP | Structured tool exposure for AI agents via navigator.modelContext |
| Voice (Option A) | ElevenLabs | Production-ready conversational voice |
| Voice (Option B) | Google ADK + Gemini Audio | Fully Google-native stack with bidirectional streaming |
| Messaging | Google Messages / RCS | Entry point - in-app browser renders the web session |
| Session Control | Webfuse API | Start/stop sessions, one-click mobile launch |
This Isn't Hypothetical
WebMCP is already in Chrome Canary and stabilizing. Google ADK is open-source and actively developed. Webfuse is production-ready today.
The infrastructure for web-first agentic customer experiences exists now. It's composable. It's pluggable. And it runs on the platform everyone already has: the browser.
The question isn't whether the web can be the platform for AI agents. The question is: how fast can you build on it?
Want to see this in action for your use case? Book a demo or reach out to our team.
Frequently Asked Questions
Ready to Get Started?
14-day free trial
Stay Updated
Related Articles
A Gentle Introduction to AI Agents for the Web
LLMs only recently enabled serviceable web agents: autonomous systems that browse web on behalf of a human. Get started with fundamental methodology, key design challenges, and technological opportunities.
Web Augmentation: The Comprehensive Guide
Discover how to enhance any website with web augmentation. How Augmented Web Proxies and Virtual Web Sessions enable secure co-browsing, automation, compliance overlays, and more.
