The Web Is the Platform: How WebMCP Makes Agentic Customer Journeys Instant

By Nicholas PiëlMay 18, 20267 min read

What if a customer could complete a loan application, an onboarding flow, or a purchase just by talking to you inside a chat? No downloads. No app installs. No friction.

That's not a hypothetical future. It's what happens when you combine Webfuse's zero-code co-browsing with Chrome's WebMCP protocol and a conversational voice agent, all running in the browser.

TL;DR

The browser is the platform. Around 90% of customer journeys happen on the web, including inside messaging-app in-app browsers.

WebMCP replaces screenshots and DOM scraping. Websites expose structured tools to AI agents through navigator.modelContext, so the agent calls actions instead of guessing pixels.

Webfuse makes any site agent-ready in one click. A proxy-based co-browsing layer works on any website, with no SDK, no app store review, and no native code.

The voice layer is pluggable. Ship today with ElevenLabs, or go fully Google-native with ADK + Gemini Audio.

The result is days, not months. What used to need SDK integrations, screenshot agents, and bespoke tool servers is now a composable web-native stack.

Let me show you.

Try It Live

Interact with the demo above, or open it in a new tab.

The Problem: Agents Are Still Pretending to Be Humans

If you've watched an AI agent "use" a website, you know the absurdity:

  • Taking screenshots of pages and burning through thousands of tokens to guess which button is which
  • Scraping raw HTML, hoping nothing changed since the last crawl
  • Clicking around until something works

This is billion-parameter models pretending to be humans, pixel by pixel. It's fragile, expensive, and slow. A single form fill that takes a human ten seconds might require dozens of sequential agent interactions, each one an inference call that adds latency and cost.

Web UI is designed for humans. AI agents need structure.

The Solution: WebMCP Gives Agents Structured Access to the Web

Enter WebMCP (Web Model Context Protocol), a browser-native standard from Google that lets websites expose structured, callable tools directly to AI agents through navigator.modelContext.

Instead of an agent guessing which blue rectangle is the "Submit" button, the website publishes a contract:

// Instead of: "find submit button and click it"
// With WebMCP, the agent calls:
submit_form({ data })

The agent knows exactly what actions are available, what parameters they accept, and what results they return. No screenshots. No DOM scraping. No guessing.

The difference is stark: WebMCP transforms every website into a structured tool for AI agents, turning the browser from a passive display into an interactive platform.

Make Any Website Agent-Ready

Webfuse turns any website into a shared, agent-controllable surface with zero code. Pair it with WebMCP and a voice agent to ship agentic customer journeys in days, not months, on the platform every customer already has - the browser.

No credit card required
14-day free trial

How It Works in Practice

Layer 1: Webfuse - Instant Co-Browsing, Zero Code

Webfuse is a proxy-based co-browsing platform. It works on any website, including third-party sites, without requiring SDK installation, code changes, or app wrapper integration.

When a customer taps a button inside their messaging app, the session opens in the in-app browser. Both the agent and the user see the same page simultaneously. The agent can navigate, highlight, and guide. The user can interact freely.

No join codes. No downloads. No QR codes. One tap.

Layer 2: WebMCP - Structured Agent-to-Web Interaction

While the user and agent share the viewport, WebMCP gives the agent structured access to page actions. The agent doesn't need to "see" the page visually to know what to do; it calls the tools the website exposes.

This is the critical shift: the agent acts on the page while the user watches and guides via voice.

Layer 3: Voice - Natural, Conversational Control

The voice interface makes the interaction feel natural. The customer speaks. The agent responds. And while they're having this conversation, the agent is actively navigating the page, filling forms, clicking buttons, guiding the user through the journey.

And here's the key: the voice layer is pluggable.

You can use ElevenLabs for production-ready voice today. Or you can swap in Google ADK + Gemini Audio for a fully Google-native stack:

  • Google ADK provides bidirectional voice streaming with Gemini-powered reasoning
  • WebMCP tools integrate directly into ADK agents, no bridging layer needed
  • Gemini multimodal audio handles both speech synthesis and understanding in one model

The narrative becomes clear: Webfuse on the web layer, ADK + WebMCP on the agent layer, Gemini for voice, all Google infrastructure.

Why Web-First Wins

You might be wondering: why not use native app co-browsing?

Because 90% of customer journeys happen on the web. Even inside messaging apps, the in-app browser is your canvas. WebMCP makes that canvas agent-ready. You don't need a native SDK to guide a customer through a form, a purchase, or an onboarding flow.

The browser is the universal runtime. And with WebMCP, it's also the most agent-ready runtime.

For the handful of cases where you absolutely need native UI access, SDK-based tools fill that gap. But for the vast majority of customer interactions, forms, transactions, onboarding, guided sales, web is not "good enough." It's the right platform.

Speed-to-Market Is the Whole Point

The demo you just saw above took days to put together, not weeks. Here's why:

LayerTraditional ApproachWebMCP + Webfuse Approach
Co-BrowsingSDK integration, app store review, native codeZero-code proxy, works instantly on any website
Agent InteractionScreenshot-based parsing, DOM scraping, custom tool serversWebMCP, structured tool registration in the browser
Voice InterfaceThird-party API integrationPluggable: ElevenLabs or Google ADK + Gemini
Total TimeWeeks to monthsDays

The integration point is the agent, and with WebMCP, that integration is structured tool registration, not DOM parsing and screenshot analysis.

The Stack

LayerTechnologyRole
Co-BrowsingWebfuse / SurflyProxy-based co-browsing, zero SDK, works on any website
Agent ProtocolChrome WebMCPStructured tool exposure for AI agents via navigator.modelContext
Voice (Option A)ElevenLabsProduction-ready conversational voice
Voice (Option B)Google ADK + Gemini AudioFully Google-native stack with bidirectional streaming
MessagingGoogle Messages / RCSEntry point - in-app browser renders the web session
Session ControlWebfuse APIStart/stop sessions, one-click mobile launch

This Isn't Hypothetical

WebMCP is already in Chrome Canary and stabilizing. Google ADK is open-source and actively developed. Webfuse is production-ready today.

The infrastructure for web-first agentic customer experiences exists now. It's composable. It's pluggable. And it runs on the platform everyone already has: the browser.

The question isn't whether the web can be the platform for AI agents. The question is: how fast can you build on it?

Want to see this in action for your use case? Book a demo or reach out to our team.

Frequently Asked Questions

What is WebMCP and how does it differ from MCP? +
Why use the browser instead of a native app for agentic experiences? +
Can I swap the voice layer for a Google-native stack? +
Why not native app co-browsing? +
What about complex app workflows? +
How fast can we build something like this? +

Related Articles