[{"data":1,"prerenderedAt":3121},["ShallowReactive",2],{"/blog/top-5-voice-ai-agents-for-website-integration-in-2026":3,"related-/blog/top-5-voice-ai-agents-for-website-integration-in-2026":590},{"id":4,"title":5,"authorId":6,"body":7,"category":571,"created":572,"description":573,"extension":574,"faqs":575,"featurePriority":575,"head":575,"landingPath":576,"meta":577,"navigation":578,"ogImage":575,"path":579,"robots":575,"schemaOrg":575,"seo":580,"sitemap":581,"stem":582,"tags":583,"__hash__":589},"blog/blog/1018.top-5-voice-ai-agents-for-website-integration-in-2026.md","Top 5 Voice AI Agents for Website Integration in 2026","salome-koshadze",{"type":8,"value":9,"toc":542},"minimark",[10,63,66,82,88,91,94,101,110,117,124,127,130,150,156,159,162,168,171,177,182,191,197,200,203,229,235,238,241,244,250,256,264,270,273,291,294,300,303,323,329,332,335,341,346,355,361,364,378,384,387,407,413,420,423,426,432,437,444,450,453,473,479,486,489,492,495,501,504,507,539],[11,12,13,26,60],"tldr-box",{},[14,15,16,17,21,22,25],"p",{},"Voice AI is evolving from basic chatbots to agentic systems that can execute tasks directly on websites. The AI agent market is projected to reach ",[18,19,20],"strong",{},"$50.31 billion by 2030",", with ",[18,23,24],{},"40% of enterprise applications"," expected to use task-specific agents by 2026. This guide compares the top 5 platforms for 2026:",[27,28,29,36,42,48,54],"ul",{},[30,31,32,35],"li",{},[18,33,34],{},"ElevenLabs"," - Best for realistic, emotionally expressive voices with 400+ integrations",[30,37,38,41],{},[18,39,40],{},"Deepgram"," - Optimized for speed with \u003C250ms latency and unified API",[30,43,44,47],{},[18,45,46],{},"Vapi"," - Maximum flexibility for developers to mix and match AI models",[30,49,50,53],{},[18,51,52],{},"Google Dialogflow"," - Enterprise-grade solution integrated with Google Cloud",[30,55,56,59],{},[18,57,58],{},"Voiceflow"," - Visual, collaborative platform for team-based agent design",[14,61,62],{},"Each platform offers unique strengths depending on your priorities: voice quality, speed, flexibility, enterprise scale, or team collaboration.",[14,64,65],{},"For years, websites have been silent partners in our digital tasks. We click, we type, and they respond in a predictable, structured manner. That one-way interaction is undergoing a major redesign, shifting towards a collaborative, conversational model. By 2026, the use of voice AI agents that you can talk to and direct on a website is projected to become a widespread feature for businesses aiming to offer more intuitive and efficient user experiences.",[14,67,68,69,73,74,81],{},"This new wave of technology moves past simple chatbots. We are now looking at the integration of ",[70,71,72],"em",{},"agentic"," voice AI. This means the AI can perform tasks and execute actions on the user's behalf directly on the webpage. Imagine telling a website, \"Book me a flight to New York for next Tuesday, and find a hotel near Central Park,\" and watch it happen without needing to navigate menus or fill out forms. This capability is rapidly becoming a reality. The global market for AI agents was valued at USD 5.40 billion in 2024 and is ",[75,76,80],"a",{"href":77,"rel":78},"https://www.grandviewresearch.com/industry-analysis/ai-agents-market-report",[79],"nofollow","projected to reach USD 50.31 billion"," by 2030.",[83,84],"article-signup-cta",{":demoAction":85,"heading":86,"subtitle":87},"{\"text\":\"Explore Voice AI Agents\",\"href\":\"/use-case/voice-agents\"}","Develop Voice AI Agents For Any Web Application","Create intelligent voice-powered agents that can listen, understand, and interact with any web application. Deploy conversational AI that enhances user experience through natural speech interfaces.",[14,89,90],{},"This shift results in benefits such as increased speed and convenience for users, along with higher engagement and new avenues for customer support for businesses. The technology making this possible has seen considerable advancements. Lower latency in speech recognition and text-to-speech, paired with highly capable Large Language Models (LLMs), allows for real-time, human-like conversations. By 2026, it is anticipated that 40% of enterprise applications will use task-specific AI agents.",[14,92,93],{},"So, how do we get past basic chatbots and build these highly capable voice agents for website integration? Several platforms provide the tools to make this happen. Let's look at some of these major players expected to lead the way in 2026:",[95,96,98],"h2",{"id":97},"elevenlabs-the-intersection-of-lifelike-voice-and-agentic-action",[18,99,100],{},"ElevenLabs: The Intersection of Lifelike Voice and Agentic Action",[102,103],"nuxt-picture",{":height":104,":width":105,"alt":106,"format":107,"loading":108,"src":109},"1037","1999","Elevenlabs screenshot","webp","lazy","/blog/top-5-voice-ai-agents-for-website-integration-in-2026/1.png",[14,111,112,116],{},[75,113,34],{"href":114,"rel":115},"https://elevenlabs.io/",[79]," has built a major reputation on one thing: generating some of the most realistic and emotionally nuanced AI voices available. But the platform is expanding beyond high-quality audio. It now provides a complete conversational AI platform designed to create voice agents that can be integrated directly into websites. This positions it as a key player for 2026, offering a unique combination of highly expressive voice synthesis and the backend intelligence to perform tasks.",[118,119,121],"h3",{"id":120},"from-voice-synthesis-to-on-site-action",[18,122,123],{},"From Voice Synthesis to On-Site Action",[14,125,126],{},"The core strength of ElevenLabs lies in its ability to produce audio that is nearly indistinguishable from human speech. Users have a high degree of control over voice attributes, including pitch, speed, and emotional expression, across a library of over 1200 voices in more than 29 languages.",[14,128,129],{},"What  sets the platform apart are its agentic capabilities, which make it suitable for modern website integration. Let's look at some of these:",[27,131,132,138,144],{},[30,133,134,137],{},[18,135,136],{},"Real-Time, Low-Latency API:"," For a voice conversation to feel natural, the response must be immediate. ElevenLabs has optimized its system for low latency, with its streaming APIs capable of delivering audio in under 100 milliseconds. This is fast enough to support real-time, interactive conversations without awkward delays.",[30,139,140,143],{},[18,141,142],{},"A Massive Integration Library:"," An agent is only as useful as the actions it can perform. ElevenLabs provides over 400 pre-configured integrations with a wide range of external systems. This allows an agent on your website to connect directly to CRMs like Salesforce, scheduling tools like Calendly, and communication platforms like Slack to execute tasks mid-conversation. For example, a user could ask the agent to book a meeting, and the agent could access calendar availability and confirm the appointment without the user ever leaving the page.",[30,145,146,149],{},[18,147,148],{},"Custom Knowledge and Intelligence:"," You can ground the agent in your specific data by uploading documents or connecting it to your website's content. Using Retrieval-Augmented Generation (RAG), the agent can pull from these sources to provide accurate, up-to-date answers, acting as an expert on your products or services. You can also connect your own Large Language Model (LLM), such as models from Google or OpenAI, to tailor its reasoning capabilities.",[118,151,153],{"id":152},"simple-deployment-on-your-website",[18,154,155],{},"Simple Deployment on Your Website",[14,157,158],{},"Getting an ElevenLabs agent onto a website is a direct process. The platform provides a code snippet that can be embedded into your site's HTML. For popular platforms like WordPress or Webflow, this is as simple as adding a custom HTML block or an embed element. This accessibility means that a fully functional, voice-driven agent can be deployed in minutes rather than months.",[14,160,161],{},"The agent appears as a widget on the page, which users can interact with through voice or text. From a single dashboard, you can configure the agent's personality, set its first message, and monitor conversation transcripts to see how it's performing.",[118,163,165],{"id":164},"a-usage-based-model",[18,166,167],{},"A Usage-Based Model",[14,169,170],{},"ElevenLabs operates on a usage-based pricing model, typically measured in characters or credits. This structure includes a free tier, allowing for experimentation and small-scale projects. Paid plans scale up based on the volume of characters generated and offer access to more advanced features like professional voice cloning and higher-quality audio outputs. This approach allows businesses to start small and scale their usage as the value of the voice agent is proven.",[95,172,174],{"id":173},"deepgram-engineered-for-conversational-speed",[18,175,176],{},"Deepgram: Engineered for Conversational Speed",[102,178],{":height":179,":width":105,"alt":180,"format":107,"loading":108,"src":181},"1049","Deepgram screenshot","/blog/top-5-voice-ai-agents-for-website-integration-in-2026/2.png",[14,183,184,185,190],{},"While ElevenLabs puts the quality of the voice at the forefront, ",[75,186,189],{"href":187,"rel":188},"https://deepgram.com/",[79],"Deepgram's"," major strength is its foundation in speed and accuracy. Originally known for its highly performant speech-to-text (STT) services, Deepgram has expanded its offerings to provide an end-to-end platform for building real-time voice AI agents. For website integration where responsiveness is a major factor, Deepgram presents a highly compelling option.",[118,192,194],{"id":193},"the-need-for-speed-in-voice-ai",[18,195,196],{},"The Need for Speed in Voice AI",[14,198,199],{},"For a voice agent on a website to feel interactive and not clunky, the time between a user speaking and the agent responding must be minimal. Any noticeable delay breaks the illusion of a natural conversation. This is where Deepgram directs its focus. The company has engineered its entire system to minimize latency, reporting response times of under 250 milliseconds. This speed creates a conversational flow that feels immediate and human-like.",[14,201,202],{},"This is achieved by building a complete, in-house technology stack. Instead of relying on a chain of different services for transcription, language processing, and voice synthesis, Deepgram handles it all. Let's break down what makes it a strong contender for agentic website integration.",[27,204,205,211,217,223],{},[30,206,207,210],{},[18,208,209],{},"A Unified API for Voice:"," Deepgram provides a single, unified API that manages the entire conversational loop. This includes industry-leading speech-to-text, access to language models for intelligence, and their own text-to-speech engine, Aura. This simplifies the development process, as developers do not need to piece together multiple services.",[30,212,213,216],{},[18,214,215],{},"Highly Accurate Transcription:"," The accuracy of the agent's understanding begins with the transcription. Deepgram's models are known for their high accuracy across a wide range of accents and dialects. The platform also includes features like smart formatting and punctuation to make the transcribed text more reliable for the language model to interpret.",[30,218,219,222],{},[18,220,221],{},"Developer-Centric Tools:"," Deepgram is built with developers in mind. It offers Software Development Kits (SDKs) for popular programming languages like Python and Node.js. This makes it easier to integrate Deepgram's voice capabilities into a custom front-end application on a website, giving developers full control over the user interface and experience.",[30,224,225,228],{},[18,226,227],{},"Conversational Intelligence Features:"," Beyond just transcription, Deepgram can provide deeper insights into the conversation. It can detect sentiment, identify topics being discussed, and even summarize conversations. For a website agent, this information can be used to route a user to the correct department or to understand customer satisfaction in real time.",[118,230,232],{"id":231},"building-a-custom-experience",[18,233,234],{},"Building a Custom Experience",[14,236,237],{},"Unlike platforms that offer a pre-built widget, integrating Deepgram into a website typically involves a more custom development approach. Developers use Deepgram's APIs and SDKs to build a unique voice interface tailored to their specific needs. This offers a high degree of flexibility in how the agent looks, feels, and operates.",[14,239,240],{},"For example, a developer could build an interactive product guide where a user can ask questions about different features shown on the screen. The website's front end would capture the user's audio, send it to Deepgram's API, and then receive both the transcribed text and the synthesized audio response to play back to the user. Because the API can also connect to other tools, the agent could then take actions like adding a product to the cart or scheduling a demo.",[14,242,243],{},"Deepgram's pricing is consumption-based, billed per second of audio processed. This model allows for scalability, with costs directly tied to the amount of usage the voice agent receives.",[95,245,247],{"id":246},"vapi-the-developers-toolkit-for-composable-voice-ai",[18,248,249],{},"Vapi: The Developer's Toolkit for Composable Voice AI",[102,251],{":height":252,":width":253,"alt":254,"format":107,"loading":108,"src":255},"1938","3840","Vapi screenshot","/blog/top-5-voice-ai-agents-for-website-integration-in-2026/3.png",[14,257,258,259,263],{},"Where other platforms provide an all-in-one system, ",[75,260,46],{"href":261,"rel":262},"https://vapi.ai/",[79]," positions itself differently. It is a highly configurable platform built specifically for developers who want to construct their own voice AI agents by combining best-in-class technologies.Instead of offering its own custom models for every step, Vapi acts as a coordination layer, handling the complex infrastructure required to make different services for speech-to-text, language processing, and text-to-speech work together in real-time.",[118,265,267],{"id":266},"a-focus-on-coordination-not-creation",[18,268,269],{},"A Focus on Coordination, Not Creation",[14,271,272],{},"The core philosophy behind Vapi is flexibility. Developers are not locked into a single ecosystem. They can choose the components that best fit their needs. For instance, a developer could build a voice agent that uses:",[27,274,275,280,286],{},[30,276,277,279],{},[18,278,40],{}," for its fast and accurate speech-to-text.",[30,281,282,285],{},[18,283,284],{},"OpenAI's GPT-4o or Anthropic's Claude 3"," for advanced reasoning and intelligence.",[30,287,288,290],{},[18,289,34],{}," for its highly realistic and expressive voice output.",[14,292,293],{},"Vapi's platform manages the complex flow of data between these services, ensuring the conversation happens with very low latency. This \"bring your own models\" approach is ideal for teams that want to fine-tune every aspect of their agent's performance and personality.",[118,295,297],{"id":296},"key-features-for-building-capable-agents",[18,298,299],{},"Key Features for Building Capable Agents",[14,301,302],{},"Vapi's developer-first focus is evident in its feature set, which is geared towards creating highly functional and intelligent voice agents for websites.",[27,304,305,311,317],{},[30,306,307,310],{},[18,308,309],{},"Powerful Tool Calling:"," This is a major feature for creating true agentic behavior. Tool calling allows the AI assistant to connect to and use external APIs during a conversation. For example, a voice agent on an e-commerce site could use a tool to check inventory levels, process a payment through Stripe, or create a shipping label by calling the shipping provider's API-all based on a user's spoken request.",[30,312,313,316],{},[18,314,315],{},"Simplified Real-Time Infrastructure:"," Handling real-time voice communication over the web can be complex. Vapi abstracts this away by managing WebSocket connections and the streaming of audio data. This frees up developers to focus on the agent's logic and capabilities rather than the underlying plumbing.",[30,318,319,322],{},[18,320,321],{},"Web Integration via SDK and Widget:"," Vapi offers multiple ways to get an agent onto a website. For maximum control, developers can use the JavaScript SDK to build a completely custom voice interface. For quicker deployment, Vapi also provides an embeddable web widget that can be added to a site with a single line of code, offering a floating chat interface that supports both voice and text.",[118,324,326],{"id":325},"built-for-custom-workflows",[18,327,328],{},"Built for Custom Workflows",[14,330,331],{},"Integrating Vapi into a website is a process designed for technical teams. The platform is API-native, meaning every feature is exposed through an API for extensive configuration. Developers can define their assistant's parameters, set up custom prompts, and connect to their own back-end systems to pull data or trigger actions. This makes it possible to create highly bespoke voice experiences that are deeply integrated with a website's existing functionality.",[14,333,334],{},"Vapi's pricing is usage-based, typically charging a small fee per minute for coordinating the conversation, in addition to the costs of the third-party STT, LLM, and TTS models you choose to use. This model offers transparency and allows businesses to scale their costs directly with their usage.",[95,336,338],{"id":337},"google-cloud-dialogflow-the-enterprise-grade-conversational-engine",[18,339,340],{},"Google Cloud Dialogflow: The Enterprise-Grade Conversational Engine",[102,342],{":height":343,":width":253,"alt":344,"format":107,"loading":108,"src":345},"1932","Google Dialogflow screenshot","/blog/top-5-voice-ai-agents-for-website-integration-in-2026/4.png",[14,347,348,349,354],{},"When we talk about building highly scalable and complex voice AI agents, Google Cloud's ",[75,350,353],{"href":351,"rel":352},"https://cloud.google.com/dialogflow",[79],"Dialogflow"," is a major part of the conversation. As Google's native platform for natural language understanding, it's designed to build conversational interfaces for everything from mobile apps to large-scale contact centers. For website integration, its primary advantage is its deep connection to the wider Google Cloud Platform (GCP), offering access to some of the most powerful AI and data tools available.",[118,356,358],{"id":357},"two-flavors-es-for-simplicity-cx-for-complexity",[18,359,360],{},"Two Flavors: ES for Simplicity, CX for Complexity",[14,362,363],{},"Dialogflow comes in two main versions: ES (Essentials) and CX (Customer Experience).",[27,365,366,372],{},[30,367,368,371],{},[18,369,370],{},"Dialogflow ES"," is the original version, suitable for smaller or less complex agents. It uses a flat structure of \"intents\" to understand user requests, which is effective for straightforward conversations but can become difficult to manage in larger agents.",[30,373,374,377],{},[18,375,376],{},"Dialogflow CX"," is the newer, advanced offering designed for large and very complex agents. It uses a state machine approach, organizing conversations into \"flows\" and \"pages.\" This gives developers clear control over the conversational path, making it much easier to design, visualize, and maintain intricate, multi-turn dialogues. For building true agentic experiences on a website, Dialogflow CX is generally the more suitable choice.",[118,379,381],{"id":380},"the-advantage-of-the-google-ecosystem",[18,382,383],{},"The Advantage of the Google Ecosystem",[14,385,386],{},"The real strength of using Dialogflow is that it doesn't exist in a vacuum. It seamlessly integrates with other Google Cloud services, allowing developers to build highly intelligent and capable agents.",[27,388,389,395,401],{},[30,390,391,394],{},[18,392,393],{},"Vertex AI Integration:"," You can connect your Dialogflow agent to Google's Vertex AI platform. This opens up the ability to use state-of-the-art generative AI models for more dynamic, intelligent responses and to ground the agent in your company's own data.",[30,396,397,400],{},[18,398,399],{},"Google Cloud Functions:"," For executing actions, Dialogflow agents can trigger Cloud Functions. This allows the agent to run serverless code in response to a user's request, enabling it to interact with databases, call third-party APIs, or perform almost any back-end task.",[30,402,403,406],{},[18,404,405],{},"Contact Center AI (CCAI):"," Dialogflow is a core component of Google's CCAI platform. This means an agent built for a website can be part of a much larger customer service strategy, with the ability to hand off conversations to human agents with full context.",[118,408,410],{"id":409},"website-integration-through-messenger",[18,411,412],{},"Website Integration Through Messenger",[14,414,415,416,419],{},"Google provides a direct way to embed a Dialogflow agent onto a website using ",[18,417,418],{},"Dialogflow Messenger",". This integration provides a simple, customizable chat widget that can be added to any webpage by embedding a small snippet of HTML code.",[14,421,422],{},"Through the Dialogflow console, you can configure the look and feel of the widget and enable it. For more advanced use cases, developers can use Dialogflow's REST APIs to build a completely custom user interface, giving them full control over the conversational experience on their site.",[14,424,425],{},"Dialogflow is priced on a pay-as-you-go basis, with costs determined by the version (CX or ES) and the number of requests. Voice sessions, which include both audio input and output, are billed per second of use. New customers often receive trial credits to help get started with the platform.",[95,427,429],{"id":428},"voiceflow-the-collaborative-canvas-for-ai-agent-design",[18,430,431],{},"Voiceflow: The Collaborative Canvas for AI Agent Design",[102,433],{":height":434,":width":105,"alt":435,"format":107,"loading":108,"src":436},"1089","Voiceflow screenshot","/blog/top-5-voice-ai-agents-for-website-integration-in-2026/5.png",[14,438,439,443],{},[75,440,58],{"href":441,"rel":442},"https://www.voiceflow.com/",[79]," enters the landscape with a different approach, focusing on the collaborative design and development of AI agents. It provides a visual, low-code platform where entire teams including designers, writers, and developers can work together to build complex conversational experiences. For website integration, this means that the logic and flow of the agent can be mapped out and prototyped in a highly intuitive, drag-and-drop environment before being deployed.",[118,445,447],{"id":446},"visualizing-the-conversation",[18,448,449],{},"Visualizing the Conversation",[14,451,452],{},"The main feature of Voiceflow is its visual canvas. Instead of writing code to define conversational logic, you build it using blocks and connectors. This makes it much easier to visualize the user's journey, account for different conversational paths, and identify potential dead ends. This visual-first method brings several major benefits to building a website agent.",[27,454,455,461,467],{},[30,456,457,460],{},[18,458,459],{},"Rapid Prototyping and Iteration:"," You can design, test, and refine a complete conversational flow directly within the Voiceflow canvas. The built-in prototyper lets you interact with your agent as you build it, making it fast to spot issues and make improvements without writing any deployment code.",[30,462,463,466],{},[18,464,465],{},"A Central Hub for Team Collaboration:"," Voiceflow's canvas acts as a single source of truth for the AI agent. Product managers can map out the high-level logic, UX writers can craft the dialogue, and developers can jump in to configure the technical integrations, all within the same shared workspace.",[30,468,469,472],{},[18,470,471],{},"Turning Design into Action:"," The visual design is not just a blueprint; it is the agent's executable logic. To make the agent truly agentic, developers can add API blocks or custom code snippets directly into the canvas. This allows the agent to fetch data from external sources, connect to services like a CRM or booking system, and perform actions on behalf of the user.",[118,474,476],{"id":475},"from-canvas-to-website",[18,477,478],{},"From Canvas to Website",[14,480,481,482,485],{},"Voiceflow provides a straightforward path to get the agent you've designed onto your website. The primary method is through the ",[18,483,484],{},"Voiceflow Web Chat",", an embeddable widget that can be installed on any site with a single block of code.",[14,487,488],{},"This widget is highly customizable, allowing you to change its appearance to match your brand. It supports both voice and text input, giving users the flexibility to interact in the way they prefer. Once the widget is live, any changes you make to the agent's design on the Voiceflow canvas are updated in real-time, allowing for continuous improvement without needing to redeploy the code.",[14,490,491],{},"For teams wanting a more deeply integrated or custom front-end experience, Voiceflow also offers a Dialog Manager API. This allows developers to use Voiceflow as the conversational backend while building a completely bespoke user interface for their website.",[14,493,494],{},"Voiceflow's pricing is structured in tiers, with a free plan for individuals and small projects, a pro plan for growing teams, and an enterprise plan for large organizations that require advanced features like dedicated support and security reviews. This makes the platform accessible for a wide range of use cases, from simple informational bots to highly capable, task-performing agents.",[95,496,498],{"id":497},"choosing-the-right-voice-ai-agent-for-your-website",[18,499,500],{},"Choosing the Right Voice AI Agent for Your Website",[14,502,503],{},"The transition from static, clickable websites to dynamic, conversational partners represents a major evolution in user experience. The five platforms we've examined each provide a different set of tools to build these agentic voice experiences. The suitable choice for your project will depend on your team's technical skills, your project's complexity, and your primary goals.",[14,505,506],{},"Here is a breakdown to help you identify which platform aligns best with your needs:",[27,508,509,515,521,527,533],{},[30,510,511,514],{},[18,512,513],{},"Go with ElevenLabs if..."," your top priority is delivering the most realistic and emotionally expressive voice possible. It is a strong choice when the quality of the audio experience is a key part of your brand, and you want to get started quickly using a large library of pre-built integrations.",[30,516,517,520],{},[18,518,519],{},"Go with Deepgram if..."," you are building a custom voice application where conversational speed is the most important factor. Its unified, high-performance system is engineered for the lowest possible latency, making it ideal for developers who need to ensure conversations feel natural and immediate.",[30,522,523,526],{},[18,524,525],{},"Go with Vapi if..."," you are a developer-focused team that requires maximum flexibility. Vapi's integration platform lets you mix and match your preferred models for speech-to-text, language processing, and text-to-speech, giving you full control to build a \"best-of-breed\" agent.",[30,528,529,532],{},[18,530,531],{},"Go with Google Cloud Dialogflow if..."," you are an enterprise, particularly one already integrated into the Google Cloud ecosystem. Its CX version is built to handle highly complex, large-scale conversational agents that need to connect with other enterprise systems and data platforms.",[30,534,535,538],{},[18,536,537],{},"Go with Voiceflow if..."," your project is a collaborative effort between designers, writers, and developers. Its visual, low-code canvas makes it the ideal environment for teams to design, prototype, and manage the logic of an AI agent together before deploying it to a website.",[14,540,541],{},"Looking ahead, the capabilities of these voice agents are poised for even greater advancement. We can anticipate deeper website integration, where agents can not only converse but also actively guide users by highlighting elements, navigating pages, and filling out forms. As the underlying language models continue to become more intelligent, the web will move closer to becoming a collection of truly interactive and helpful conversational partners.",{"title":543,"searchDepth":544,"depth":544,"links":545},"",2,[546,552,556,561,566,570],{"id":97,"depth":544,"text":100,"children":547},[548,550,551],{"id":120,"depth":549,"text":123},3,{"id":152,"depth":549,"text":155},{"id":164,"depth":549,"text":167},{"id":173,"depth":544,"text":176,"children":553},[554,555],{"id":193,"depth":549,"text":196},{"id":231,"depth":549,"text":234},{"id":246,"depth":544,"text":249,"children":557},[558,559,560],{"id":266,"depth":549,"text":269},{"id":296,"depth":549,"text":299},{"id":325,"depth":549,"text":328},{"id":337,"depth":544,"text":340,"children":562},[563,564,565],{"id":357,"depth":549,"text":360},{"id":380,"depth":549,"text":383},{"id":409,"depth":549,"text":412},{"id":428,"depth":544,"text":431,"children":567},[568,569],{"id":446,"depth":549,"text":449},{"id":475,"depth":549,"text":478},{"id":497,"depth":544,"text":500},"voice-ai","2025-10-10","Discover the top 5 voice AI agent platforms for website integration in 2026. Compare ElevenLabs, Deepgram, Vapi, Google Dialogflow, and Voiceflow to build conversational experiences that listen, understand, and take action on your website.","md",null,"/use-case/voice-agents",{},true,"/blog/top-5-voice-ai-agents-for-website-integration-in-2026",{"title":5,"description":573},{"loc":579},"blog/1018.top-5-voice-ai-agents-for-website-integration-in-2026",[584,585,586,587,588],"ai-agents","browser-agents","voice-agents","web-agents","web-automation","yISJ6x-R3Akdweoe25nCIh1WsUi-lESsYVz4tJyUz68",[591,2376],{"id":592,"title":593,"authorId":594,"body":595,"category":584,"created":2354,"description":2355,"extension":574,"faqs":575,"featurePriority":575,"head":575,"landingPath":575,"meta":2356,"navigation":578,"ogImage":575,"path":2368,"robots":575,"schemaOrg":575,"seo":2369,"sitemap":2370,"stem":2371,"tags":2372,"__hash__":2375},"blog/blog/1012.dom-downsampling-for-llm-based-web-agents.md","DOM Downsampling for LLM-Based Web Agents","thassilo-schiepanski",{"type":8,"value":596,"toc":2339},[597,602,625,629,636,640,655,659,665,669,687,714,717,721,724,735,741,772,776,796,808,813,829,843,846,850,870,874,882,894,898,901,1293,1299,1306,1470,1477,1568,1575,1647,1656,1662,1671,1675,1681,1691,1703,1937,1955,2032,2038,2153,2157,2169,2178,2183,2188,2191,2195,2201,2206,2244,2248,2254,2258,2268,2272,2275,2335],[102,598],{":width":599,"alt":600,"format":107,"loading":108,"src":601},"900","Downsampling visualised for digital images and HTML","/blog/dom-downsampling-for-web-agents/1.png",[14,603,604,609,610,609,615,620,621,624],{},[75,605,608],{"href":606,"rel":607},"https://operator.chatgpt.com",[79],"Operator (OpenAI)",", ",[75,611,614],{"href":612,"rel":613},"https://www.director.ai",[79],"Director (Browserbase)",[75,616,619],{"href":617,"rel":618},"https://browser-use.com",[79],"Browser Use"," – we are currently witnessing the rise of ",[18,622,623],{},"web AI agents",". The first iteration of serviceable web agents was enabled by frontier LLMs, which act as instantaneous domain model backends. The domain, hereby, corresponds to the landscape of web application UIs.",[95,626,628],{"id":627},"what-is-a-snapshot","What is a Snapshot?",[14,630,631,632,635],{},"Web agents provide an LLM with a task, and serialised runtime state of a currently browsed web application (e.g., a screenshot). The LLM is ought to suggest relevant actions to perform in the web application. Serialisation of such runtime state is referred to as a ",[18,633,634],{},"snapshot",". And the snapshot technique primarily decides the quality of LLM interaction suggestions.",[118,637,639],{"id":638},"gui-snapshots","GUI Snapshots",[14,641,642,643,646,647,650,651,654],{},"Screenshots – for consistency reasons referred to as ",[18,644,645],{},"GUI snapshots"," – resemble how humans visually perceive web application UIs. LLM APIs subsidise the use of image input through upstream compression. Compresssion, however, irreversibly affects image dimensions, which takes away pixel precision; no way to suggest interactions like ",[70,648,649],{},"“click at 100, 735”",". As a workaround, early web agents used ",[70,652,653],{},"grounded"," GUI snapshots. Grounding describes adding visual cues to the GUI, such as bounding boxes with numerical identifiers. Grounding lets the LLM refer to specific parts of the page by identifier, so the agent can trace back interaction targets.",[102,656],{":width":599,"alt":657,"format":107,"loading":108,"src":658},"Grounded GUI snapshot as implemented by Browser Use","/blog/dom-downsampling-for-web-agents/2.png",[14,660,661],{},[662,663,664],"small",{},"Grounded GUI snapshot as implemented by Browser Use.",[118,666,668],{"id":667},"dom-snapshots","DOM Snapshots",[14,670,671,672,682,683,686],{},"LLMs arguably are much better at understanding code than images. Research supports they excel at describing and classifying HTML, and also navigating an inherent UI",[673,674,675],"sup",{},[75,676,681],{"href":677,"ariaDescribedBy":678,"dataFootnoteRef":543,"id":680},"#user-content-fn-1",[679],"footnote-label","user-content-fnref-1","1",". The DOM (document object model) – a web browser's runtime state model of a web application – translates back to HTML. For this reason, ",[18,684,685],{},"DOM snapshots"," offer a compelling alternative to GUI snapshots. DOM snapshots offer a handful of key advantages:",[688,689,690,693,696,699,702],"ol",{},[30,691,692],{},"DOM snapshots connect with LLM code (HTML) interpretation abilities.",[30,694,695],{},"DOM snapshots can be compiled from deep clones, hidden from supervision (unlike GUI grounding).",[30,697,698],{},"DOM snapshots render text input that on average consume less bandwidth than screnshots.",[30,700,701],{},"DOM snapshots allow for exact programmatic targeting of elements (e.g., via CSS selectors).",[30,703,704,705,709,710,713],{},"DOM snapshots are available with the ",[706,707,708],"code",{},"DOMContentLoaded"," event (whereas the GUI completes initial rendering with ",[706,711,712],{},"load",").",[14,715,716],{},"Yet, DOM snapshots have a major problem: potentially exhaustive model context. Whereas GUI snapshot commonly cost four figures of tokens, a raw DOM snapshot can cost into hundreds of thousands of tokens. To connect with LLM code interpretation abilities, however, developers have used element extraction techniques – picking only (likely) important elements from the DOM. Element extraction flattens the DOM tree, which disregards hierarchy as a potential UI feature (how do elements relate to each other?).",[95,718,720],{"id":719},"dom-downsampling-a-novel-approach","DOM Downsampling: A Novel Approach",[14,722,723],{},"To enable DOM snapshots for use with web agents, it requires client-side pre-processing – similar to how LLM vision APIs process image input. Downsampling is a fundamental signal processing technique that reduces data that scales out of time or space constraints under the assumption that the majority of relevant features is retained. Picture JPEG compression as an example: put simply, a JPEG image stores only an average colour for patches of pixels. The bigger the patches, the smaller the file. Although some detail is lost, key image features – colours, edges, objects – keep being recognisable – up to a large patch size.",[14,725,726,727,730,731,734],{},"We transfer the concept of ",[18,728,729],{},"downsampling"," to ",[18,732,733],{},"DOMs",". Particularly, since such an approach retains HTML characteristics that might be valuable for an LLM backend. We define UI features as concepts that, to a substantial degree, facilitate LLM suggestions on how to act in the UI in order to solve related web-based tasks.",[95,736,738],{"id":737},"d2snap",[70,739,740],{},"D2Snap",[14,742,743,744,752,760,768,769,771],{},"We recently proposed ",[75,745,748],{"href":746,"rel":747},"https://arxiv.org/abs/2508.04412",[79],[18,749,750],{},[70,751,740],{},[673,753,754],{},[75,755,759],{"href":756,"ariaDescribedBy":757,"dataFootnoteRef":543,"id":758},"#user-content-fn-2",[679],"user-content-fnref-2","2",[673,761,762],{},[75,763,767],{"href":764,"ariaDescribedBy":765,"dataFootnoteRef":543,"id":766},"#user-content-fn-3",[679],"user-content-fnref-3","3"," – a first-of-its-kind downsampling algorithm for DOMs. Herein, we'll briefly explain how the ",[70,770,740],{}," algorithm works, and how it can be utilised to build efficient and performant web agents.",[118,773,775],{"id":774},"how-it-works","How it works",[14,777,778,779,781,782,609,785,788,789,792,793,713],{},"There are basically three redundant types of DOM nodes, and HTML concepts: elements, text, and attributes. We defined and empirically adjusted three node-specific procedures. ",[70,780,740],{}," downsamples at a variable ratio, configured through procedure-specific parameters  ",[706,783,784],{},"k",[706,786,787],{},"l",", and ",[706,790,791],{},"m"," (",[706,794,795],{},"∈ [0, 1]",[797,798,799],"blockquote",{},[14,800,801,802,807],{},"We used ",[75,803,806],{"href":804,"rel":805},"https://openai.com/index/hello-gpt-4o/",[79],"GPT-4o"," to create a downsampling ground truth dataset by having it classify HTML elements and scoring semantics regarding relevance for understanding the inherent UI – a UI feature degree.",[809,810,812],"h4",{"id":811},"procedure-elements","Procedure: Elements",[14,814,815,817,818,821,822,825,826,828],{},[70,816,740],{}," downsamples (simplifies) elements by merging container elements like ",[706,819,820],{},"section"," and ",[706,823,824],{},"div"," together. A parameter ",[706,827,784],{}," controls the merge ratio depending on the total DOM tree height. For competing concepts, such as element name, the ground truth determines which element's characterisitics to keep – comparing UI feature scores.",[14,830,831,832,609,834,836,837,842],{},"Elements in content elements (",[706,833,14],{},[706,835,797],{},", ...) are translated to a more comprehensive ",[75,838,841],{"href":839,"rel":840},"https://www.markdownguide.org/basic-syntax/",[79],"Markdown"," representation.",[14,844,845],{},"Interactive elements, definite interaction target candidates, are kept as is.",[809,847,849],{"id":848},"procedure-text","Procedure: Text",[14,851,852,854,855,858,866,867,869],{},[70,853,740],{}," downsamples text by dropping a fraction. Natural units of text are space-separated words, or punctuation-separated sentences. We reuse the ",[70,856,857],{},"TextRank",[673,859,860],{},[75,861,865],{"href":862,"ariaDescribedBy":863,"dataFootnoteRef":543,"id":864},"#user-content-fn-4",[679],"user-content-fnref-4","4"," algorithm to rank sentences in text nodes. The lowest-ranking fraction of sentences, denoted by parameter ",[706,868,787],{},", is dropped.",[809,871,873],{"id":872},"procedure-attributes","Procedure: Attributes",[14,875,876,878,879,881],{},[70,877,740],{}," downsamples attributes by dropping those with a name that, according to ground truth, holds a UI feature degree below a threshold. Parameter ",[706,880,791],{}," denotes this threshold.",[797,883,884],{},[14,885,886,887,893],{},"Check out the ",[75,888,890,892],{"href":746,"rel":889},[79],[70,891,740],{}," paper"," to learn about the algorithm in-depth.",[118,895,897],{"id":896},"example-of-a-downsampled-dom","Example of a Downsampled DOM",[14,899,900],{},"Consider a partial DOM state, serialised as HTML:",[902,903,907],"pre",{"className":904,"code":905,"language":906,"meta":543,"style":543},"language-html shiki shiki-themes catppuccin-latte night-owl","\u003Csection class=\"container\" tabindex=\"3\" required=\"true\" type=\"example\">\n  \u003Cdiv class=\"mx-auto\" data-topic=\"products\" required=\"false\">\n    \u003Ch1>Our Pizza\u003C/h1>\n    \u003Cdiv>\n      \u003Cdiv class=\"shadow-lg\">\n        \u003Ch2>Margherita\u003C/h2>\n        \u003Cp>\n          A simple classic: mozzarela, tomatoes and basil.\n          An everyday choice!\n        \u003C/p>\n        \u003Cbutton type=\"button\">Add\u003C/button>\n      \u003C/div>\n      \u003Cdiv class=\"shadow-lg\">\n        \u003Ch2>Capricciosa\u003C/h2>\n        \u003Cp>\n          A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n          A true favourite!\n          \u003C/p>\n        \u003Cbutton type=\"button\">Add\u003C/button>\n      \u003C/div>\n    \u003C/div>\n  \u003C/div>\n\u003C/section>\n","html",[706,908,909,976,1019,1041,1050,1071,1090,1099,1105,1111,1121,1150,1160,1179,1197,1206,1212,1218,1228,1255,1264,1274,1284],{"__ignoreMap":543},[910,911,914,918,921,925,928,932,936,938,941,943,945,947,949,952,954,956,959,961,964,966,968,971,973],"span",{"class":912,"line":913},"line",1,[910,915,917],{"class":916},"s9rnR","\u003C",[910,919,820],{"class":920},"sY2RG",[910,922,924],{"class":923},"swkLt"," class",[910,926,927],{"class":916},"=",[910,929,931],{"class":930},"sbuKk","\"",[910,933,935],{"class":934},"sfrMT","container",[910,937,931],{"class":930},[910,939,940],{"class":923}," tabindex",[910,942,927],{"class":916},[910,944,931],{"class":930},[910,946,767],{"class":934},[910,948,931],{"class":930},[910,950,951],{"class":923}," required",[910,953,927],{"class":916},[910,955,931],{"class":930},[910,957,958],{"class":934},"true",[910,960,931],{"class":930},[910,962,963],{"class":923}," type",[910,965,927],{"class":916},[910,967,931],{"class":930},[910,969,970],{"class":934},"example",[910,972,931],{"class":930},[910,974,975],{"class":916},">\n",[910,977,978,981,983,985,987,989,992,994,997,999,1001,1004,1006,1008,1010,1012,1015,1017],{"class":912,"line":544},[910,979,980],{"class":916},"  \u003C",[910,982,824],{"class":920},[910,984,924],{"class":923},[910,986,927],{"class":916},[910,988,931],{"class":930},[910,990,991],{"class":934},"mx-auto",[910,993,931],{"class":930},[910,995,996],{"class":923}," data-topic",[910,998,927],{"class":916},[910,1000,931],{"class":930},[910,1002,1003],{"class":934},"products",[910,1005,931],{"class":930},[910,1007,951],{"class":923},[910,1009,927],{"class":916},[910,1011,931],{"class":930},[910,1013,1014],{"class":934},"false",[910,1016,931],{"class":930},[910,1018,975],{"class":916},[910,1020,1021,1024,1027,1030,1034,1037,1039],{"class":912,"line":549},[910,1022,1023],{"class":916},"    \u003C",[910,1025,1026],{"class":920},"h1",[910,1028,1029],{"class":916},">",[910,1031,1033],{"class":1032},"s2kId","Our Pizza",[910,1035,1036],{"class":916},"\u003C/",[910,1038,1026],{"class":920},[910,1040,975],{"class":916},[910,1042,1044,1046,1048],{"class":912,"line":1043},4,[910,1045,1023],{"class":916},[910,1047,824],{"class":920},[910,1049,975],{"class":916},[910,1051,1053,1056,1058,1060,1062,1064,1067,1069],{"class":912,"line":1052},5,[910,1054,1055],{"class":916},"      \u003C",[910,1057,824],{"class":920},[910,1059,924],{"class":923},[910,1061,927],{"class":916},[910,1063,931],{"class":930},[910,1065,1066],{"class":934},"shadow-lg",[910,1068,931],{"class":930},[910,1070,975],{"class":916},[910,1072,1074,1077,1079,1081,1084,1086,1088],{"class":912,"line":1073},6,[910,1075,1076],{"class":916},"        \u003C",[910,1078,95],{"class":920},[910,1080,1029],{"class":916},[910,1082,1083],{"class":1032},"Margherita",[910,1085,1036],{"class":916},[910,1087,95],{"class":920},[910,1089,975],{"class":916},[910,1091,1093,1095,1097],{"class":912,"line":1092},7,[910,1094,1076],{"class":916},[910,1096,14],{"class":920},[910,1098,975],{"class":916},[910,1100,1102],{"class":912,"line":1101},8,[910,1103,1104],{"class":1032},"          A simple classic: mozzarela, tomatoes and basil.\n",[910,1106,1108],{"class":912,"line":1107},9,[910,1109,1110],{"class":1032},"          An everyday choice!\n",[910,1112,1114,1117,1119],{"class":912,"line":1113},10,[910,1115,1116],{"class":916},"        \u003C/",[910,1118,14],{"class":920},[910,1120,975],{"class":916},[910,1122,1124,1126,1129,1131,1133,1135,1137,1139,1141,1144,1146,1148],{"class":912,"line":1123},11,[910,1125,1076],{"class":916},[910,1127,1128],{"class":920},"button",[910,1130,963],{"class":923},[910,1132,927],{"class":916},[910,1134,931],{"class":930},[910,1136,1128],{"class":934},[910,1138,931],{"class":930},[910,1140,1029],{"class":916},[910,1142,1143],{"class":1032},"Add",[910,1145,1036],{"class":916},[910,1147,1128],{"class":920},[910,1149,975],{"class":916},[910,1151,1153,1156,1158],{"class":912,"line":1152},12,[910,1154,1155],{"class":916},"      \u003C/",[910,1157,824],{"class":920},[910,1159,975],{"class":916},[910,1161,1163,1165,1167,1169,1171,1173,1175,1177],{"class":912,"line":1162},13,[910,1164,1055],{"class":916},[910,1166,824],{"class":920},[910,1168,924],{"class":923},[910,1170,927],{"class":916},[910,1172,931],{"class":930},[910,1174,1066],{"class":934},[910,1176,931],{"class":930},[910,1178,975],{"class":916},[910,1180,1182,1184,1186,1188,1191,1193,1195],{"class":912,"line":1181},14,[910,1183,1076],{"class":916},[910,1185,95],{"class":920},[910,1187,1029],{"class":916},[910,1189,1190],{"class":1032},"Capricciosa",[910,1192,1036],{"class":916},[910,1194,95],{"class":920},[910,1196,975],{"class":916},[910,1198,1200,1202,1204],{"class":912,"line":1199},15,[910,1201,1076],{"class":916},[910,1203,14],{"class":920},[910,1205,975],{"class":916},[910,1207,1209],{"class":912,"line":1208},16,[910,1210,1211],{"class":1032},"          A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[910,1213,1215],{"class":912,"line":1214},17,[910,1216,1217],{"class":1032},"          A true favourite!\n",[910,1219,1221,1224,1226],{"class":912,"line":1220},18,[910,1222,1223],{"class":916},"          \u003C/",[910,1225,14],{"class":920},[910,1227,975],{"class":916},[910,1229,1231,1233,1235,1237,1239,1241,1243,1245,1247,1249,1251,1253],{"class":912,"line":1230},19,[910,1232,1076],{"class":916},[910,1234,1128],{"class":920},[910,1236,963],{"class":923},[910,1238,927],{"class":916},[910,1240,931],{"class":930},[910,1242,1128],{"class":934},[910,1244,931],{"class":930},[910,1246,1029],{"class":916},[910,1248,1143],{"class":1032},[910,1250,1036],{"class":916},[910,1252,1128],{"class":920},[910,1254,975],{"class":916},[910,1256,1258,1260,1262],{"class":912,"line":1257},20,[910,1259,1155],{"class":916},[910,1261,824],{"class":920},[910,1263,975],{"class":916},[910,1265,1267,1270,1272],{"class":912,"line":1266},21,[910,1268,1269],{"class":916},"    \u003C/",[910,1271,824],{"class":920},[910,1273,975],{"class":916},[910,1275,1277,1280,1282],{"class":912,"line":1276},22,[910,1278,1279],{"class":916},"  \u003C/",[910,1281,824],{"class":920},[910,1283,975],{"class":916},[910,1285,1287,1289,1291],{"class":912,"line":1286},23,[910,1288,1036],{"class":916},[910,1290,820],{"class":920},[910,1292,975],{"class":916},[14,1294,1295,1296,1298],{},"Here are some ",[70,1297,740],{}," downsampling results, which are based on different parametric configurations. A percentage denotes the reduced size.",[809,1300,1302,1305],{"id":1301},"k3-l3-m3-55",[706,1303,1304],{},"k=.3, l=.3, m=.3"," (55%)",[902,1307,1309],{"className":904,"code":1308,"language":906,"meta":543,"style":543},"\u003Csection tabindex=\"3\" type=\"example\" class=\"container\" required=\"true\">\n  # Our Pizza\n  \u003Cdiv class=\"shadow-lg\">\n    ## Margherita\n    A simple classic: mozzarela, tomatoes, and basil.\n    \u003Cbutton type=\"button\">Add\u003C/button>\n    ## Capricciosa\n    A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n    \u003Cbutton type=\"button\">Add\u003C/button>\n  \u003C/div>\n\u003C/section>\n",[706,1310,1311,1359,1364,1382,1387,1392,1418,1423,1428,1454,1462],{"__ignoreMap":543},[910,1312,1313,1315,1317,1319,1321,1323,1325,1327,1329,1331,1333,1335,1337,1339,1341,1343,1345,1347,1349,1351,1353,1355,1357],{"class":912,"line":913},[910,1314,917],{"class":916},[910,1316,820],{"class":920},[910,1318,940],{"class":923},[910,1320,927],{"class":916},[910,1322,931],{"class":930},[910,1324,767],{"class":934},[910,1326,931],{"class":930},[910,1328,963],{"class":923},[910,1330,927],{"class":916},[910,1332,931],{"class":930},[910,1334,970],{"class":934},[910,1336,931],{"class":930},[910,1338,924],{"class":923},[910,1340,927],{"class":916},[910,1342,931],{"class":930},[910,1344,935],{"class":934},[910,1346,931],{"class":930},[910,1348,951],{"class":923},[910,1350,927],{"class":916},[910,1352,931],{"class":930},[910,1354,958],{"class":934},[910,1356,931],{"class":930},[910,1358,975],{"class":916},[910,1360,1361],{"class":912,"line":544},[910,1362,1363],{"class":1032},"  # Our Pizza\n",[910,1365,1366,1368,1370,1372,1374,1376,1378,1380],{"class":912,"line":549},[910,1367,980],{"class":916},[910,1369,824],{"class":920},[910,1371,924],{"class":923},[910,1373,927],{"class":916},[910,1375,931],{"class":930},[910,1377,1066],{"class":934},[910,1379,931],{"class":930},[910,1381,975],{"class":916},[910,1383,1384],{"class":912,"line":1043},[910,1385,1386],{"class":1032},"    ## Margherita\n",[910,1388,1389],{"class":912,"line":1052},[910,1390,1391],{"class":1032},"    A simple classic: mozzarela, tomatoes, and basil.\n",[910,1393,1394,1396,1398,1400,1402,1404,1406,1408,1410,1412,1414,1416],{"class":912,"line":1073},[910,1395,1023],{"class":916},[910,1397,1128],{"class":920},[910,1399,963],{"class":923},[910,1401,927],{"class":916},[910,1403,931],{"class":930},[910,1405,1128],{"class":934},[910,1407,931],{"class":930},[910,1409,1029],{"class":916},[910,1411,1143],{"class":1032},[910,1413,1036],{"class":916},[910,1415,1128],{"class":920},[910,1417,975],{"class":916},[910,1419,1420],{"class":912,"line":1092},[910,1421,1422],{"class":1032},"    ## Capricciosa\n",[910,1424,1425],{"class":912,"line":1101},[910,1426,1427],{"class":1032},"    A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[910,1429,1430,1432,1434,1436,1438,1440,1442,1444,1446,1448,1450,1452],{"class":912,"line":1107},[910,1431,1023],{"class":916},[910,1433,1128],{"class":920},[910,1435,963],{"class":923},[910,1437,927],{"class":916},[910,1439,931],{"class":930},[910,1441,1128],{"class":934},[910,1443,931],{"class":930},[910,1445,1029],{"class":916},[910,1447,1143],{"class":1032},[910,1449,1036],{"class":916},[910,1451,1128],{"class":920},[910,1453,975],{"class":916},[910,1455,1456,1458,1460],{"class":912,"line":1113},[910,1457,1279],{"class":916},[910,1459,824],{"class":920},[910,1461,975],{"class":916},[910,1463,1464,1466,1468],{"class":912,"line":1123},[910,1465,1036],{"class":916},[910,1467,820],{"class":920},[910,1469,975],{"class":916},[809,1471,1473,1476],{"id":1472},"k4-l6-m8-27",[706,1474,1475],{},"k=.4, l=.6, m=.8"," (27%)",[902,1478,1480],{"className":904,"code":1479,"language":906,"meta":543,"style":543},"\u003Csection>\n  # Our Pizza\n  \u003Cdiv>\n    ## Margherita\n    A simple classic:\n    \u003Cbutton>Add\u003C/button>\n    ## Capricciosa\n    A rich taste:\n    \u003Cbutton>Add\u003C/button>\n  \u003C/div>\n\u003C/section>\n",[706,1481,1482,1490,1494,1502,1506,1511,1527,1531,1536,1552,1560],{"__ignoreMap":543},[910,1483,1484,1486,1488],{"class":912,"line":913},[910,1485,917],{"class":916},[910,1487,820],{"class":920},[910,1489,975],{"class":916},[910,1491,1492],{"class":912,"line":544},[910,1493,1363],{"class":1032},[910,1495,1496,1498,1500],{"class":912,"line":549},[910,1497,980],{"class":916},[910,1499,824],{"class":920},[910,1501,975],{"class":916},[910,1503,1504],{"class":912,"line":1043},[910,1505,1386],{"class":1032},[910,1507,1508],{"class":912,"line":1052},[910,1509,1510],{"class":1032},"    A simple classic:\n",[910,1512,1513,1515,1517,1519,1521,1523,1525],{"class":912,"line":1073},[910,1514,1023],{"class":916},[910,1516,1128],{"class":920},[910,1518,1029],{"class":916},[910,1520,1143],{"class":1032},[910,1522,1036],{"class":916},[910,1524,1128],{"class":920},[910,1526,975],{"class":916},[910,1528,1529],{"class":912,"line":1092},[910,1530,1422],{"class":1032},[910,1532,1533],{"class":912,"line":1101},[910,1534,1535],{"class":1032},"    A rich taste:\n",[910,1537,1538,1540,1542,1544,1546,1548,1550],{"class":912,"line":1107},[910,1539,1023],{"class":916},[910,1541,1128],{"class":920},[910,1543,1029],{"class":916},[910,1545,1143],{"class":1032},[910,1547,1036],{"class":916},[910,1549,1128],{"class":920},[910,1551,975],{"class":916},[910,1553,1554,1556,1558],{"class":912,"line":1113},[910,1555,1279],{"class":916},[910,1557,824],{"class":920},[910,1559,975],{"class":916},[910,1561,1562,1564,1566],{"class":912,"line":1123},[910,1563,1036],{"class":916},[910,1565,820],{"class":920},[910,1567,975],{"class":916},[809,1569,1571,1574],{"id":1570},"k-l0-m-35",[706,1572,1573],{},"k→∞, l=0, ∀m"," (35%)",[902,1576,1578],{"className":904,"code":1577,"language":906,"meta":543,"style":543},"# Our Pizza\n## Margherita\nA simple classic: mozzarela, tomatoes, and basil.\nAn everyday choice!\n\u003Cbutton>Add\u003C/button>\n## Capricciosa\nA rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\nA true favourite!\n\u003Cbutton>Add\u003C/button>\n",[706,1579,1580,1585,1590,1595,1600,1616,1621,1626,1631],{"__ignoreMap":543},[910,1581,1582],{"class":912,"line":913},[910,1583,1584],{"class":1032},"# Our Pizza\n",[910,1586,1587],{"class":912,"line":544},[910,1588,1589],{"class":1032},"## Margherita\n",[910,1591,1592],{"class":912,"line":549},[910,1593,1594],{"class":1032},"A simple classic: mozzarela, tomatoes, and basil.\n",[910,1596,1597],{"class":912,"line":1043},[910,1598,1599],{"class":1032},"An everyday choice!\n",[910,1601,1602,1604,1606,1608,1610,1612,1614],{"class":912,"line":1052},[910,1603,917],{"class":916},[910,1605,1128],{"class":920},[910,1607,1029],{"class":916},[910,1609,1143],{"class":1032},[910,1611,1036],{"class":916},[910,1613,1128],{"class":920},[910,1615,975],{"class":916},[910,1617,1618],{"class":912,"line":1073},[910,1619,1620],{"class":1032},"## Capricciosa\n",[910,1622,1623],{"class":912,"line":1092},[910,1624,1625],{"class":1032},"A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[910,1627,1628],{"class":912,"line":1101},[910,1629,1630],{"class":1032},"A true favourite!\n",[910,1632,1633,1635,1637,1639,1641,1643,1645],{"class":912,"line":1107},[910,1634,917],{"class":916},[910,1636,1128],{"class":920},[910,1638,1029],{"class":916},[910,1640,1143],{"class":1032},[910,1642,1036],{"class":916},[910,1644,1128],{"class":920},[910,1646,975],{"class":916},[14,1648,1649,1650,1652,1653,1655],{},"Asymptotic ",[706,1651,784],{}," (kind of 'infinite' ",[706,1654,784],{},") completely flattens the DOM, that is, leads to a full content linearisation similar to reader views as present in most browsers. Notably, it preserves all interactive elements like buttons – which are essential for a web agent.",[118,1657,1659],{"id":1658},"adaptived2snap",[70,1660,1661],{},"AdaptiveD2Snap",[14,1663,1664,1665,1667,1668,1670],{},"Fixed parameters might not be ideal for arbitrary DOMs – sourced from a landscape of web applications. We created ",[70,1666,1661],{}," – a wrapper for ",[70,1669,740],{}," that infers suitable parameters from a given DOM in order to hit a certain token budget.",[118,1672,1674],{"id":1673},"implementation-integration","Implementation & Integration",[14,1676,1677,1678,1680],{},"Picture an LLM-based weg agent that is premised on DOM snapshots. Implementing ",[70,1679,740],{}," is simple: Deep clone the DOM, and feed it to the algorithm. Now, take the snapshot; this is, serialise the resulting DOM. Done.",[797,1682,1683],{},[14,1684,1685,1686,1690],{},"Read our ",[75,1687,1689],{"href":1688},"/blog/a-gentle-introduction-to-ai-agents-for-the-web","gentle introduction to AI agents for the web"," to get started with high-level web agent concepts.",[14,1692,1693,1694,1696,1697,1702],{},"The open source ",[70,1695,740],{}," API, provided as a ",[75,1698,1701],{"href":1699,"rel":1700},"https://github.com/webfuse-com/D2Snap",[79],"package on GitHub"," provides the following signature:",[902,1704,1708],{"className":1705,"code":1706,"language":1707,"meta":543,"style":543},"language-ts shiki shiki-themes catppuccin-latte night-owl","type DOM = Document | Element | string;\ntype Options = {\n  assignUniqueIDs?: boolean; // false\n  debug?: boolean;           // true\n};\n\nD2Snap.d2Snap(\n  dom: DOM,\n  k: number, l: number, m: number,\n  options?: Options\n): Promise\u003Cstring>\n\nD2Snap.adaptiveD2Snap(\n  dom: DOM,\n  maxTokens: number = 4096,\n  maxIterations: number = 5,\n  options?: Options\n): Promise\u003Cstring>\n\n","ts",[706,1709,1710,1743,1755,1774,1788,1793,1798,1813,1825,1843,1853,1869,1873,1884,1892,1905,1917,1925],{"__ignoreMap":543},[910,1711,1712,1716,1720,1723,1727,1730,1733,1735,1739],{"class":912,"line":913},[910,1713,1715],{"class":1714},"s76yb","type",[910,1717,1719],{"class":1718},"sXbZB"," DOM ",[910,1721,927],{"class":1722},"s-_ek",[910,1724,1726],{"class":1725},"s-DR7"," Document",[910,1728,1729],{"class":916}," |",[910,1731,1732],{"class":1725}," Element",[910,1734,1729],{"class":916},[910,1736,1738],{"class":1737},"scrte"," string",[910,1740,1742],{"class":1741},"scGhl",";\n",[910,1744,1745,1747,1750,1752],{"class":912,"line":544},[910,1746,1715],{"class":1714},[910,1748,1749],{"class":1718}," Options ",[910,1751,927],{"class":1722},[910,1753,1754],{"class":1741}," {\n",[910,1756,1757,1761,1764,1767,1770],{"class":912,"line":549},[910,1758,1760],{"class":1759},"swl0y","  assignUniqueIDs",[910,1762,1763],{"class":916},"?:",[910,1765,1766],{"class":1737}," boolean",[910,1768,1769],{"class":1741},";",[910,1771,1773],{"class":1772},"sDmS1"," // false\n",[910,1775,1776,1779,1781,1783,1785],{"class":912,"line":1043},[910,1777,1778],{"class":1759},"  debug",[910,1780,1763],{"class":916},[910,1782,1766],{"class":1737},[910,1784,1769],{"class":1741},[910,1786,1787],{"class":1772},"           // true\n",[910,1789,1790],{"class":912,"line":1052},[910,1791,1792],{"class":1741},"};\n",[910,1794,1795],{"class":912,"line":1073},[910,1796,1797],{"emptyLinePlaceholder":578},"\n",[910,1799,1800,1802,1806,1810],{"class":912,"line":1092},[910,1801,740],{"class":1032},[910,1803,1805],{"class":1804},"s5FwJ",".",[910,1807,1809],{"class":1808},"sNstc","d2Snap",[910,1811,1812],{"class":1032},"(\n",[910,1814,1815,1818,1822],{"class":912,"line":1101},[910,1816,1817],{"class":1032},"  dom: ",[910,1819,1821],{"class":1820},"sqxXB","DOM",[910,1823,1824],{"class":1741},",\n",[910,1826,1827,1830,1833,1836,1838,1841],{"class":912,"line":1107},[910,1828,1829],{"class":1032},"  k: number",[910,1831,1832],{"class":1741},",",[910,1834,1835],{"class":1032}," l: number",[910,1837,1832],{"class":1741},[910,1839,1840],{"class":1032}," m: number",[910,1842,1824],{"class":1741},[910,1844,1845,1848,1850],{"class":912,"line":1113},[910,1846,1847],{"class":1032},"  options",[910,1849,1763],{"class":1722},[910,1851,1852],{"class":1032}," Options\n",[910,1854,1855,1858,1862,1864,1867],{"class":912,"line":1123},[910,1856,1857],{"class":1032},"): ",[910,1859,1861],{"class":1860},"s8Irk","Promise",[910,1863,917],{"class":1722},[910,1865,1866],{"class":1032},"string",[910,1868,975],{"class":1722},[910,1870,1871],{"class":912,"line":1152},[910,1872,1797],{"emptyLinePlaceholder":578},[910,1874,1875,1877,1879,1882],{"class":912,"line":1162},[910,1876,740],{"class":1032},[910,1878,1805],{"class":1804},[910,1880,1881],{"class":1808},"adaptiveD2Snap",[910,1883,1812],{"class":1032},[910,1885,1886,1888,1890],{"class":912,"line":1181},[910,1887,1817],{"class":1032},[910,1889,1821],{"class":1820},[910,1891,1824],{"class":1741},[910,1893,1894,1897,1899,1903],{"class":912,"line":1199},[910,1895,1896],{"class":1032},"  maxTokens: number ",[910,1898,927],{"class":1722},[910,1900,1902],{"class":1901},"sZ_Zo"," 4096",[910,1904,1824],{"class":1741},[910,1906,1907,1910,1912,1915],{"class":912,"line":1208},[910,1908,1909],{"class":1032},"  maxIterations: number ",[910,1911,927],{"class":1722},[910,1913,1914],{"class":1901}," 5",[910,1916,1824],{"class":1741},[910,1918,1919,1921,1923],{"class":912,"line":1214},[910,1920,1847],{"class":1032},[910,1922,1763],{"class":1722},[910,1924,1852],{"class":1032},[910,1926,1927,1929,1931,1933,1935],{"class":912,"line":1220},[910,1928,1857],{"class":1032},[910,1930,1861],{"class":1860},[910,1932,917],{"class":1722},[910,1934,1866],{"class":1032},[910,1936,975],{"class":1722},[14,1938,1939,1940,1942,1943,1948,1949,1954],{},"Moreover, ",[70,1941,740],{}," it is available on the ",[75,1944,1947],{"href":1945,"rel":1946},"https://dev.webfuse.com/automation-api",[79],"Webfuse Automation API",". ",[75,1950,1953],{"href":1951,"rel":1952},"https://www.webfuse.com",[79],"Webfuse"," essentially is a proxy to seamlessly serve any existing web application with custom augmentations, such as a web agent widget.",[902,1956,1960],{"className":1957,"code":1958,"language":1959,"meta":543,"style":543},"language-js shiki shiki-themes catppuccin-latte night-owl","const domSnapshot = await browser.webfuseSession\n    .automation\n    .take_dom_snapshot({ modifier: 'downsample' })\n","js",[706,1961,1962,1988,1997],{"__ignoreMap":543},[910,1963,1964,1967,1971,1974,1978,1982,1984],{"class":912,"line":913},[910,1965,1966],{"class":1714},"const",[910,1968,1970],{"class":1969},"scsc5"," domSnapshot",[910,1972,1973],{"class":1722}," =",[910,1975,1977],{"class":1976},"srhcd"," await",[910,1979,1981],{"class":1980},"sP4PM"," browser",[910,1983,1805],{"class":1804},[910,1985,1987],{"class":1986},"s8apv","webfuseSession\n",[910,1989,1990,1993],{"class":912,"line":544},[910,1991,1992],{"class":1804},"    .",[910,1994,1996],{"class":1995},"sL4Ga","automation\n",[910,1998,1999,2001,2004,2007,2010,2013,2017,2020,2023,2026,2029],{"class":912,"line":549},[910,2000,1992],{"class":1804},[910,2002,2003],{"class":1808},"take_dom_snapshot",[910,2005,2006],{"class":1032},"(",[910,2008,2009],{"class":1741},"{",[910,2011,2012],{"class":1032}," modifier",[910,2014,2016],{"class":2015},"sVS64",":",[910,2018,2019],{"class":930}," '",[910,2021,2022],{"class":934},"downsample",[910,2024,2025],{"class":930},"'",[910,2027,2028],{"class":1741}," }",[910,2030,2031],{"class":1032},")\n",[14,2033,2034,2035,2037],{},"Need precise control over the underlying ",[70,2036,740],{}," invocation? Configure it exactly how you want:",[902,2039,2041],{"className":1957,"code":2040,"language":1959,"meta":543,"style":543},"const domSnapshot = await browser.webfuseSession\n    .automation\n    .take_dom_snapshot({\n        modifier: {\n            name: 'D2Snap',\n            params: { hierarchyRatio: 0.6, textRatio: 0.2, attributeRatio: 0.8 }\n        }\n    })\n",[706,2042,2043,2059,2065,2076,2085,2100,2141,2146],{"__ignoreMap":543},[910,2044,2045,2047,2049,2051,2053,2055,2057],{"class":912,"line":913},[910,2046,1966],{"class":1714},[910,2048,1970],{"class":1969},[910,2050,1973],{"class":1722},[910,2052,1977],{"class":1976},[910,2054,1981],{"class":1980},[910,2056,1805],{"class":1804},[910,2058,1987],{"class":1986},[910,2060,2061,2063],{"class":912,"line":544},[910,2062,1992],{"class":1804},[910,2064,1996],{"class":1995},[910,2066,2067,2069,2071,2073],{"class":912,"line":549},[910,2068,1992],{"class":1804},[910,2070,2003],{"class":1808},[910,2072,2006],{"class":1032},[910,2074,2075],{"class":1741},"{\n",[910,2077,2078,2081,2083],{"class":912,"line":1043},[910,2079,2080],{"class":1032},"        modifier",[910,2082,2016],{"class":2015},[910,2084,1754],{"class":1741},[910,2086,2087,2090,2092,2094,2096,2098],{"class":912,"line":1052},[910,2088,2089],{"class":1032},"            name",[910,2091,2016],{"class":2015},[910,2093,2019],{"class":930},[910,2095,740],{"class":934},[910,2097,2025],{"class":930},[910,2099,1824],{"class":1741},[910,2101,2102,2105,2107,2110,2113,2115,2118,2120,2123,2125,2128,2130,2133,2135,2138],{"class":912,"line":1073},[910,2103,2104],{"class":1032},"            params",[910,2106,2016],{"class":2015},[910,2108,2109],{"class":1741}," {",[910,2111,2112],{"class":1032}," hierarchyRatio",[910,2114,2016],{"class":2015},[910,2116,2117],{"class":1901}," 0.6",[910,2119,1832],{"class":1741},[910,2121,2122],{"class":1032}," textRatio",[910,2124,2016],{"class":2015},[910,2126,2127],{"class":1901}," 0.2",[910,2129,1832],{"class":1741},[910,2131,2132],{"class":1032}," attributeRatio",[910,2134,2016],{"class":2015},[910,2136,2137],{"class":1901}," 0.8",[910,2139,2140],{"class":1741}," }\n",[910,2142,2143],{"class":912,"line":1092},[910,2144,2145],{"class":1741},"        }\n",[910,2147,2148,2151],{"class":912,"line":1101},[910,2149,2150],{"class":1741},"    }",[910,2152,2031],{"class":1032},[118,2154,2156],{"id":2155},"performance-evaluation","Performance Evaluation",[14,2158,2159,2160,2162,2163,2165,2166,2168],{},"Now for the moment of truth: How does ",[70,2161,740],{}," stack up against the industry standard? We evaluated ",[70,2164,740],{}," in comparison to a grounded GUI snapshot baseline close to those used by ",[70,2167,619],{}," – coloured bounding boxes around visible interactive elements.",[14,2170,2171,2172,2177],{},"To evaluate snapshots isolated from specific agent logic, we crafted a dataset that spans all UI states that occur while solving a related task. We sampled our dataset from the existing ",[75,2173,2176],{"href":2174,"rel":2175},"https://github.com/OSU-NLP-Group/Online-Mind2Web",[79],"Online-Mind2Web"," dataset.",[102,2179],{":width":2180,"alt":2181,"format":107,"loading":108,"src":2182},"800","Exemplary solution UI state trajectory of a defined web-based task","/blog/dom-downsampling-for-web-agents/3.png",[14,2184,2185],{},[662,2186,2187],{},"Exemplary solution UI state trajectory for the task: “View the pricing plan for 'Business'. Specifically, we have 100 users. We need a 1PB storage quota and a 50 TB transfer quota.”",[14,2189,2190],{},"These are our key findings...",[809,2192,2194],{"id":2193},"substantial-success-rates","Substantial Success Rates",[14,2196,2197,2198,2200],{},"The results exceeded our expectations. Not only did ",[70,2199,740],{}," meet the baseline's performance – our best configuration outperformed it by a significant margin. Full linearisation matches performance, and estimated model input token size order of the baseline.",[102,2202],{":width":2203,"alt":2204,"format":107,"loading":108,"src":2205},"550","Success rate per web agent snapshot subject evaluated across the dataset","/blog/dom-downsampling-for-web-agents/4.png",[662,2207,2208,2209,2216,2217,2219,2220,2223,2224,2227,2228,2231,2232,2235,2236,2239,2240,2243],{},"\n  Success rate per web agent snapshot subject evaluated across the dataset.\n  Labels: ",[706,2210,2211,2212],{},"GUI",[2213,2214,2215],"sub",{}," gr.",": Baseline, ",[706,2218,1821],{},": Raw DOM (cut-off at ~8K tokens), ",[706,2221,2222],{},"k( l m)",": Parameter values; e.g., ",[706,2225,2226],{},".9 .3 .6",", or ",[706,2229,2230],{},".4"," if equal). ",[706,2233,2234],{},"∞",": Linearisation,  ",[706,2237,2238],{},"8192 / 32768",": via token-limited (resp.) ",[2241,2242,1661],"i",{},".\n",[809,2245,2247],{"id":2246},"containable-token-and-byte-size","Containable Token and Byte Size",[14,2249,2250,2251,2253],{},"Even light downsampling delivers dramatic size reductions. Most ",[70,2252,740],{}," configurations average just one token order above the baseline – a massive improvement over raw DOM snapshots. Better yet, most DOMs from the dataset could actually be downsampled to the baseline order. And while image data balloons in file size, our text-based approach stays lean and efficient.",[102,2255],{":width":2180,"alt":2256,"format":107,"loading":108,"src":2257},"Comparison of mean input size across and per subject","/blog/dom-downsampling-for-web-agents/5.png",[662,2259,2260,2261,2264,2265,2267],{},"\n  Left: Comparison of mean input size (tokens vs bytes) across and per subject.",[2262,2263],"br",{},"\n  Right: Estimated input token size across the dataset created by a single ",[2241,2266,740],{}," evaluation subject.\n",[809,2269,2271],{"id":2270},"hierarchy-actually-matters","Hierarchy Actually Matters",[14,2273,2274],{},"Which UI feature matters most for LLM web agent backend performance? We alternated parameter configurations to find out. Interestingly, hierarchy reveals itself as the strongest of the three assessed features. Element extraction throws away hierarchy, which suggests that downsampling is a superior technique.",[820,2276,2279,2284],{"className":2277,"dataFootnotes":543},[2278],"footnotes",[95,2280,2283],{"className":2281,"id":679},[2282],"sr-only","Footnotes",[688,2285,2286,2301,2312,2323],{},[30,2287,2289,2293,2294],{"id":2288},"user-content-fn-1",[75,2290,2291],{"href":2291,"rel":2292},"https://arxiv.org/abs/2210.03945",[79]," ",[75,2295,2300],{"href":2296,"ariaLabel":2297,"className":2298,"dataFootnoteBackref":543},"#user-content-fnref-1","Back to reference 1",[2299],"data-footnote-backref","↩",[30,2302,2304,2293,2307],{"id":2303},"user-content-fn-2",[75,2305,746],{"href":746,"rel":2306},[79],[75,2308,2300],{"href":2309,"ariaLabel":2310,"className":2311,"dataFootnoteBackref":543},"#user-content-fnref-2","Back to reference 2",[2299],[30,2313,2315,2293,2318],{"id":2314},"user-content-fn-3",[75,2316,1699],{"href":1699,"rel":2317},[79],[75,2319,2300],{"href":2320,"ariaLabel":2321,"className":2322,"dataFootnoteBackref":543},"#user-content-fnref-3","Back to reference 3",[2299],[30,2324,2326,2293,2330],{"id":2325},"user-content-fn-4",[75,2327,2328],{"href":2328,"rel":2329},"https://aclanthology.org/W04-3252",[79],[75,2331,2300],{"href":2332,"ariaLabel":2333,"className":2334,"dataFootnoteBackref":543},"#user-content-fnref-4","Back to reference 4",[2299],[2336,2337,2338],"style",{},"html pre.shiki code .s9rnR, html code.shiki .s9rnR{--shiki-default:#179299;--shiki-dark:#7FDBCA}html pre.shiki code .sY2RG, html code.shiki .sY2RG{--shiki-default:#1E66F5;--shiki-dark:#CAECE6}html pre.shiki code .swkLt, html code.shiki .swkLt{--shiki-default:#DF8E1D;--shiki-default-font-style:inherit;--shiki-dark:#C5E478;--shiki-dark-font-style:italic}html pre.shiki code .sbuKk, html code.shiki .sbuKk{--shiki-default:#40A02B;--shiki-dark:#D9F5DD}html pre.shiki code .sfrMT, html code.shiki .sfrMT{--shiki-default:#40A02B;--shiki-dark:#ECC48D}html pre.shiki code .s2kId, html code.shiki .s2kId{--shiki-default:#4C4F69;--shiki-dark:#D6DEEB}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s76yb, html code.shiki .s76yb{--shiki-default:#8839EF;--shiki-dark:#C792EA}html pre.shiki code .sXbZB, html code.shiki .sXbZB{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .s-_ek, html code.shiki .s-_ek{--shiki-default:#179299;--shiki-dark:#C792EA}html pre.shiki code .s-DR7, html code.shiki .s-DR7{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#FFCB8B;--shiki-dark-font-style:inherit}html pre.shiki code .scrte, html code.shiki .scrte{--shiki-default:#8839EF;--shiki-dark:#C5E478}html pre.shiki code .scGhl, html code.shiki .scGhl{--shiki-default:#7C7F93;--shiki-dark:#D6DEEB}html pre.shiki code .swl0y, html code.shiki .swl0y{--shiki-default:#4C4F69;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .sDmS1, html code.shiki .sDmS1{--shiki-default:#7C7F93;--shiki-default-font-style:italic;--shiki-dark:#637777;--shiki-dark-font-style:italic}html pre.shiki code .s5FwJ, html code.shiki .s5FwJ{--shiki-default:#179299;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sNstc, html code.shiki .sNstc{--shiki-default:#1E66F5;--shiki-default-font-style:italic;--shiki-dark:#82AAFF;--shiki-dark-font-style:italic}html pre.shiki code .sqxXB, html code.shiki .sqxXB{--shiki-default:#4C4F69;--shiki-dark:#82AAFF}html pre.shiki code .s8Irk, html code.shiki .s8Irk{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#C5E478;--shiki-dark-font-style:inherit}html pre.shiki code .sZ_Zo, html code.shiki .sZ_Zo{--shiki-default:#FE640B;--shiki-dark:#F78C6C}html pre.shiki code .scsc5, html code.shiki .scsc5{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#82AAFF;--shiki-dark-font-style:italic}html pre.shiki code .srhcd, html code.shiki .srhcd{--shiki-default:#8839EF;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sP4PM, html code.shiki .sP4PM{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#7FDBCA;--shiki-dark-font-style:italic}html pre.shiki code .s8apv, html code.shiki .s8apv{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#BAEBE2;--shiki-dark-font-style:italic}html pre.shiki code .sL4Ga, html code.shiki .sL4Ga{--shiki-default:#4C4F69;--shiki-dark:#BAEBE2}html pre.shiki code .sVS64, html code.shiki .sVS64{--shiki-default:#179299;--shiki-dark:#D6DEEB}",{"title":543,"searchDepth":544,"depth":544,"links":2340},[2341,2345,2346,2353],{"id":627,"depth":544,"text":628,"children":2342},[2343,2344],{"id":638,"depth":549,"text":639},{"id":667,"depth":549,"text":668},{"id":719,"depth":544,"text":720},{"id":737,"depth":544,"text":740,"children":2347},[2348,2349,2350,2351,2352],{"id":774,"depth":549,"text":775},{"id":896,"depth":549,"text":897},{"id":1658,"depth":549,"text":1661},{"id":1673,"depth":549,"text":1674},{"id":2155,"depth":549,"text":2156},{"id":679,"depth":544,"text":2283},"2025-08-18","We propose D2Snap – a first-of-its-kind downsampling algorithm for DOMs. D2Snap can be used as a pre-processing technique for DOM snapshots to optimise web agency context quality and token costs.",{"homepage":578,"relatedLinks":2357},[2358,2362,2365],{"text":2359,"href":2360,"description":2361},"What is a Website Snapshot?","/blog/snapshots-provide-llms-with-website-state","Learn what a website snapshot is and how to utilise it for web agents",{"text":2363,"href":1688,"description":2364},"What is a Web Agent?","Learn the basics of web agents",{"text":1947,"href":2366,"external":578,"description":2367},"https://dev.webfuse.com/automation-api#take_dom_snapshot","Check out the Webfuse Automation API","/blog/dom-downsampling-for-llm-based-web-agents",{"title":593,"description":2355},{"loc":2368},"blog/1012.dom-downsampling-for-llm-based-web-agents",[584,585,2373,2374,587,588],"llms","llm-context","lDh50lEtos4T_tIdGCLKDox16i6ixbPnRxPJoFpKjnE",{"id":2377,"title":2378,"authorId":594,"body":2379,"category":584,"created":3105,"description":3106,"extension":574,"faqs":575,"featurePriority":544,"head":575,"landingPath":575,"meta":3107,"navigation":578,"ogImage":575,"path":1688,"robots":575,"schemaOrg":575,"seo":3116,"sitemap":3117,"stem":3118,"tags":3119,"__hash__":3120},"blog/blog/1011.a-gentle-introduction-to-ai-agents-for-the-web.md","A Gentle Introduction to AI Agents for the Web",{"type":8,"value":2380,"toc":3086},[2381,2395,2398,2405,2411,2415,2418,2433,2437,2447,2451,2455,2468,2472,2476,2479,2484,2488,2497,2501,2512,2517,2521,2539,2543,2549,2651,2654,2887,2903,2907,2910,2915,2919,2922,2926,2943,2968,2975,2979,3017,3020,3031,3035,3038,3066,3070,3078,3083],[14,2382,2383,2384,609,2388,788,2391,2394],{},"In no time, AI became a natural part of modern web interfaces. AI agents for the web enjoy a recent hype, sparked by the means of ",[75,2385,608],{"href":2386,"rel":2387},"https://openai.com/index/introducing-operator/",[79],[75,2389,614],{"href":612,"rel":2390},[79],[75,2392,619],{"href":617,"rel":2393},[79],". By now, it is within reach to automate arbitrary web-based tasks, such as booking the cheapest flight from Berlin to Amsterdam.",[95,2396,2363],{"id":2397},"what-is-a-web-agent",[14,2399,2400,2401,2404],{},"For starters, let us break down the term ",[18,2402,2403],{},"web AI agent",": An agent is an entity that autonomously acts on behalf of another entity. An artificially intelligent agent is an application that acts on behalf of a human. In contrast to non-AI computer agents, it solves complex tasks with at least human-grade effectiveness and efficiency. For a human-centric web, web agents have deliberately been designed to browse the web in a human fashion – through UIs rather than APIs.",[102,2406],{":width":2407,"alt":2408,"format":2409,"loading":108,"src":2410},"610","High-level agent description comparing human and computer agents","svg","/blog/a-gentle-introduction-to-ai-agents-for-the-web/1.svg",[118,2412,2414],{"id":2413},"the-role-of-frontier-llms","The Role of Frontier LLMs",[14,2416,2417],{},"Web agents have been a vague desire for a long time. AI agents used to rely on complete models of a problem domain in order to allow (heuristic) search through problem states. Such models would comprise the problem world (e.g., a chessboard), actors (pawns, rooks, etc.), possible actions per actor (rook moves straight), and constraints (i.a., max one piece per field). A heterogeneous space of web application UIs describes the problem domain of a web agent: how to understand a web page, and how to interact with it to solve the declared task?",[14,2419,2420,2421,2428,2429,2432],{},"Frontier LLMs disrupted the AI agent world: explicit problem domain models beyond feasibility can now be replaced by an LLM. The LLM thereby acts as an instantaneous domain model backend that can be consulted with twofold context: serialised problem state, such as a chess position code (",[70,2422,2423,2424,2427],{},"“",[910,2425,2426],{},"..."," e4 e5 2. Nc3 f5”","), and the respective task (",[70,2430,2431],{},"“What is the best move for white?”","). For web agents, problem state corresponds to the currently browsed web application's runtime state, for instance, a screenshot.",[118,2434,2436],{"id":2435},"generalist-web-agents","Generalist Web Agents",[14,2438,2439,2440,788,2443,2446],{},"Generalist web agents are supposed to solve arbitrary tasks through a web browser. Web-based tasks can be as diverse as ",[70,2441,2442],{},"“Find a picture of a cat.”",[70,2444,2445],{},"“Book the cheapest flight from Berlin to Amsterdam tomorrow afternoon (business class, window seat).”"," In reality, generalist agents still fail uncommon or too precise tasks. While they have been critically acclaimed, they mainly act as early proofs-of-concept. Tasks that are indeed solvable with a generalist agent promise great results with an according specialist agent.",[102,2448],{":width":599,"alt":2449,"format":107,"loading":108,"src":2450},"Screenshot of a generalist web agent UI (Director)","/blog/a-gentle-introduction-to-ai-agents-for-the-web/2.png",[118,2452,2454],{"id":2453},"specialist-web-agents","Specialist Web Agents",[14,2456,2457,2458,2461,2462,2467],{},"Other than generalist agents, specialist web agents are constrained to a certain task and application domain. Specialist agents bear the major share of commercial value. Most prominently, modal chat agents that provide users with on-page help. Picture a little floating widget that can be chatted to via text or voice input. In most cases, in fact, the term ",[70,2459,2460],{},"web (AI) agent"," refers to chat agents. Chat agents – text or voice – can be implemented on top of virtually any existing website. Frontier LLMs provide a lot of commonsense out-of-the-box. A ",[75,2463,2466],{"href":2464,"rel":2465},"https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts",[79],"system prompt"," can, moreover, be leveraged to drive specialist agent quality for the respective problem domain.",[102,2469],{":width":599,"alt":2470,"format":107,"loading":108,"src":2471},"Screenshots of two modal specialist web agent UIs augmenting an underlying website's UI","/blog/a-gentle-introduction-to-ai-agents-for-the-web/3.png",[95,2473,2475],{"id":2474},"how-does-a-web-agent-work","How Does a Web Agent Work?",[14,2477,2478],{},"LLM-based web agents are premised on a more or less uniform architecture. The agent application embodies a mediator between a web browser (environment), and the LLM backend (model).",[102,2480],{":width":2481,"alt":2482,"format":2409,"loading":108,"src":2483},"480","High-level web agent architecture component view","/blog/a-gentle-introduction-to-ai-agents-for-the-web/4.svg",[118,2485,2487],{"id":2486},"the-agent-lifecycle","The Agent Lifecycle",[14,2489,2490,2491,2496],{},"To reduce a user's cognitive load, solving a web-based task is usually chunked into a sequence of UI states. Consider looking for rental apartments on ",[75,2492,2495],{"href":2493,"rel":2494},"https://www.redfin.com",[79],"redfin.com",": In the first step, you specify a location. Only subsequently are you provided with a grid of available apartments for that location.",[102,2498],{":width":599,"alt":2499,"format":107,"loading":108,"src":2500},"Example of separated UI states in a rental home search application","/blog/a-gentle-introduction-to-ai-agents-for-the-web/5.png",[14,2502,2503,2504,2511],{},"Web agent logic is iterative; not least for a sequential web interaction model, but also for a conversational agent interaction model. Browsing the web, human and computer agents represent users alike. That said, Norman's well-known ",[75,2505,2508],{"href":2506,"rel":2507},"https://mitpress.mit.edu/9780262640374/the-design-of-everyday-things/",[79],[70,2509,2510],{},"Seven Stages of Action",", which hierarchically model the human cognition cycle, transfer to the web agent lifecycle. For each UI state in a web browser (environment) and web-based task (action intention); decide where to click, type, etc. (action planning), and perform those clicks, etc. (action execution). Afterwards, perceive, interpret, and evaluate the results of those actions in the web browser (state). As long as there is a mismatch between the evaluated state and the declared goal state, repeat that cycle. Potentially prompt the user with more required information.",[102,2513],{":width":2514,"alt":2515,"format":2409,"loading":108,"src":2516},"580","Donald 'Norman's Seven Stages of Action' model of the human cognition cycle that transfers to non-human agents","/blog/a-gentle-introduction-to-ai-agents-for-the-web/6.svg",[118,2518,2520],{"id":2519},"web-context-for-llms","Web Context for LLMs",[14,2522,2523,2524,2526,2527,2530,2531,2534,2535,2538],{},"The gap from an agent towards the environment, according to ",[70,2525,2510],{},", is known as the ",[70,2528,2529],{},"gulf of execution",". In real-world scenarios, how to act in the environment in respect to a planned sequence of actions might be difficult (e.g., how to actually open the trunk of a new car?). Arguably, web agents face a novel ",[70,2532,2533],{},"gulf of intention"," towards the action planning stage: how to serialise a currently browsed web page's runtime state for LLMs? ",[70,2536,2537],{},"Snapshot"," is a more comprehensive term to describe the serialisation of a web page's current runtime state. Screenshots, for instance, represent a type of snapshot that closely resembles how humans perceive a web page at a given point in time. But are they as accessible to LLMs?",[118,2540,2542],{"id":2541},"agentic-ui-interaction","Agentic UI Interaction",[14,2544,2545,2546,2548],{},"With a qualified set of well-defined actuation methods, web agents are able to close the ",[70,2547,2529],{}," quite well. HTML element types strongly afford a certain action (e.g., click a button, type to a field). Below is how an actuation schema to present the LLM backend with could look like:",[902,2550,2552],{"className":1705,"code":2551,"language":1707,"meta":543,"style":543},"interface ActuationSchema = {\n    thought: string;\n    action: \"click\"\n        | \"scroll\"\n        | \"type\";\n    cssSelector: string;\n    data?: string;\n}[];\n",[706,2553,2554,2567,2578,2595,2607,2619,2630,2641],{"__ignoreMap":543},[910,2555,2556,2559,2562,2565],{"class":912,"line":913},[910,2557,2558],{"class":1714},"interface",[910,2560,2561],{"class":1718}," ActuationSchema",[910,2563,2564],{"class":1032}," = ",[910,2566,2075],{"class":1741},[910,2568,2569,2572,2574,2576],{"class":912,"line":544},[910,2570,2571],{"class":1032},"    thought",[910,2573,2016],{"class":916},[910,2575,1738],{"class":1737},[910,2577,1742],{"class":1741},[910,2579,2580,2583,2585,2588,2592],{"class":912,"line":549},[910,2581,2582],{"class":1032},"    action",[910,2584,2016],{"class":916},[910,2586,2587],{"class":930}," \"",[910,2589,2591],{"class":2590},"sgAC-","click",[910,2593,2594],{"class":930},"\"\n",[910,2596,2597,2600,2602,2605],{"class":912,"line":1043},[910,2598,2599],{"class":916},"        |",[910,2601,2587],{"class":930},[910,2603,2604],{"class":2590},"scroll",[910,2606,2594],{"class":930},[910,2608,2609,2611,2613,2615,2617],{"class":912,"line":1052},[910,2610,2599],{"class":916},[910,2612,2587],{"class":930},[910,2614,1715],{"class":2590},[910,2616,931],{"class":930},[910,2618,1742],{"class":1741},[910,2620,2621,2624,2626,2628],{"class":912,"line":1073},[910,2622,2623],{"class":1032},"    cssSelector",[910,2625,2016],{"class":916},[910,2627,1738],{"class":1737},[910,2629,1742],{"class":1741},[910,2631,2632,2635,2637,2639],{"class":912,"line":1092},[910,2633,2634],{"class":1032},"    data",[910,2636,1763],{"class":916},[910,2638,1738],{"class":1737},[910,2640,1742],{"class":1741},[910,2642,2643,2646,2649],{"class":912,"line":1101},[910,2644,2645],{"class":1741},"}",[910,2647,2648],{"class":1032},"[]",[910,2650,1742],{"class":1741},[14,2652,2653],{},"And a suggested actions response could, in turn, look as follows:",[902,2655,2659],{"className":2656,"code":2657,"language":2658,"meta":543,"style":543},"language-json shiki shiki-themes catppuccin-latte night-owl","[\n    {\n        \"thought\": \"Scroll newsletter cta into view\",\n        \"action\": \"scroll\",\n        \"cssSelector\": \"section#newsletter\"\n    },\n    {\n        \"thought\": \"Type email address to newsletter cta\",\n        \"action\": \"type\",\n        \"cssSelector\": \"section#newsletter > input\",\n        \"data\": \"user@example.org\"\n    },\n    {\n        \"thought\": \"Submit newsletter sign up\",\n        \"action\": \"click\",\n        \"cssSelector\": \"section#newsletter > button\"\n    }\n]\n","json",[706,2660,2661,2666,2671,2695,2714,2732,2737,2741,2760,2778,2797,2815,2819,2823,2842,2860,2877,2882],{"__ignoreMap":543},[910,2662,2663],{"class":912,"line":913},[910,2664,2665],{"class":1741},"[\n",[910,2667,2668],{"class":912,"line":544},[910,2669,2670],{"class":1741},"    {\n",[910,2672,2673,2677,2681,2683,2685,2687,2691,2693],{"class":912,"line":549},[910,2674,2676],{"class":2675},"srFR9","        \"",[910,2678,2680],{"class":2679},"s30W1","thought",[910,2682,931],{"class":2675},[910,2684,2016],{"class":1741},[910,2686,2587],{"class":930},[910,2688,2690],{"class":2689},"sCC8C","Scroll newsletter cta into view",[910,2692,931],{"class":930},[910,2694,1824],{"class":1741},[910,2696,2697,2699,2702,2704,2706,2708,2710,2712],{"class":912,"line":1043},[910,2698,2676],{"class":2675},[910,2700,2701],{"class":2679},"action",[910,2703,931],{"class":2675},[910,2705,2016],{"class":1741},[910,2707,2587],{"class":930},[910,2709,2604],{"class":2689},[910,2711,931],{"class":930},[910,2713,1824],{"class":1741},[910,2715,2716,2718,2721,2723,2725,2727,2730],{"class":912,"line":1052},[910,2717,2676],{"class":2675},[910,2719,2720],{"class":2679},"cssSelector",[910,2722,931],{"class":2675},[910,2724,2016],{"class":1741},[910,2726,2587],{"class":930},[910,2728,2729],{"class":2689},"section#newsletter",[910,2731,2594],{"class":930},[910,2733,2734],{"class":912,"line":1073},[910,2735,2736],{"class":1741},"    },\n",[910,2738,2739],{"class":912,"line":1092},[910,2740,2670],{"class":1741},[910,2742,2743,2745,2747,2749,2751,2753,2756,2758],{"class":912,"line":1101},[910,2744,2676],{"class":2675},[910,2746,2680],{"class":2679},[910,2748,931],{"class":2675},[910,2750,2016],{"class":1741},[910,2752,2587],{"class":930},[910,2754,2755],{"class":2689},"Type email address to newsletter cta",[910,2757,931],{"class":930},[910,2759,1824],{"class":1741},[910,2761,2762,2764,2766,2768,2770,2772,2774,2776],{"class":912,"line":1107},[910,2763,2676],{"class":2675},[910,2765,2701],{"class":2679},[910,2767,931],{"class":2675},[910,2769,2016],{"class":1741},[910,2771,2587],{"class":930},[910,2773,1715],{"class":2689},[910,2775,931],{"class":930},[910,2777,1824],{"class":1741},[910,2779,2780,2782,2784,2786,2788,2790,2793,2795],{"class":912,"line":1113},[910,2781,2676],{"class":2675},[910,2783,2720],{"class":2679},[910,2785,931],{"class":2675},[910,2787,2016],{"class":1741},[910,2789,2587],{"class":930},[910,2791,2792],{"class":2689},"section#newsletter > input",[910,2794,931],{"class":930},[910,2796,1824],{"class":1741},[910,2798,2799,2801,2804,2806,2808,2810,2813],{"class":912,"line":1123},[910,2800,2676],{"class":2675},[910,2802,2803],{"class":2679},"data",[910,2805,931],{"class":2675},[910,2807,2016],{"class":1741},[910,2809,2587],{"class":930},[910,2811,2812],{"class":2689},"user@example.org",[910,2814,2594],{"class":930},[910,2816,2817],{"class":912,"line":1152},[910,2818,2736],{"class":1741},[910,2820,2821],{"class":912,"line":1162},[910,2822,2670],{"class":1741},[910,2824,2825,2827,2829,2831,2833,2835,2838,2840],{"class":912,"line":1181},[910,2826,2676],{"class":2675},[910,2828,2680],{"class":2679},[910,2830,931],{"class":2675},[910,2832,2016],{"class":1741},[910,2834,2587],{"class":930},[910,2836,2837],{"class":2689},"Submit newsletter sign up",[910,2839,931],{"class":930},[910,2841,1824],{"class":1741},[910,2843,2844,2846,2848,2850,2852,2854,2856,2858],{"class":912,"line":1199},[910,2845,2676],{"class":2675},[910,2847,2701],{"class":2679},[910,2849,931],{"class":2675},[910,2851,2016],{"class":1741},[910,2853,2587],{"class":930},[910,2855,2591],{"class":2689},[910,2857,931],{"class":930},[910,2859,1824],{"class":1741},[910,2861,2862,2864,2866,2868,2870,2872,2875],{"class":912,"line":1208},[910,2863,2676],{"class":2675},[910,2865,2720],{"class":2679},[910,2867,931],{"class":2675},[910,2869,2016],{"class":1741},[910,2871,2587],{"class":930},[910,2873,2874],{"class":2689},"section#newsletter > button",[910,2876,2594],{"class":930},[910,2878,2879],{"class":912,"line":1214},[910,2880,2881],{"class":1741},"    }\n",[910,2883,2884],{"class":912,"line":1220},[910,2885,2886],{"class":1741},"]\n",[797,2888,2889],{},[14,2890,2891,2896,2897,2902],{},[75,2892,2895],{"href":2893,"rel":2894},"https://platform.openai.com/docs/guides/function-calling",[79],"Function Calling"," and the ",[75,2898,2901],{"href":2899,"rel":2900},"https://modelcontextprotocol.io",[79],"Model Context Protocol"," represent two ends to outsource an explicit actuation model – server- and client-side, respectively.",[118,2904,2906],{"id":2905},"agentic-ui-augmentation","Agentic UI Augmentation",[14,2908,2909],{},"An agent represents yet another feature to integrate with an application and its UI. Discoverability and availability, however, are among the most fundamental requirements of a web agent. Evidently, when a user experiences UI/UX friction, at least the agent should be interactive. That said, a scrolling modal web agent UI has been the go-to approach, that is, a little floating widget on top of the underlying application's UI. It comes with a major advantage: the agent application can be decoupled from the underlying, self-contained application.",[102,2911],{":width":2912,"alt":2913,"format":2409,"loading":108,"src":2914},"360","Depiction of a web agent application augmenting an underlying application in an isolated layer","/blog/a-gentle-introduction-to-ai-agents-for-the-web/7.svg",[95,2916,2918],{"id":2917},"how-to-build-a-web-agent","How to Build a Web Agent?",[14,2920,2921],{},"Believe it or not: enhancing an existing web application with a purposeful agent is a lower-hanging fruit. The evolving agent ecosystem provides you with a spectrum of solutions: instantly use a pre-compiled agent, tweak a templated agent, or develop an agent from scratch. Either way, LLMs and web browsers exist for reuse, boiling down agent development to LLM context engineering, and UI augmentation.",[118,2923,2925],{"id":2924},"develop-a-web-agent","Develop a Web Agent",[14,2927,2928,2929,2932,2933,788,2937,2942],{},"Opting for a ",[18,2930,2931],{},"pre-compiled agent"," does not necessarily involve any actual development step. Instead, pre-compiled agents allow for high-level configuration through an agent-as-a-service provider's interface. Popular agent-as-a-service providers are, i.a., ",[75,2934,34],{"href":2935,"rel":2936},"https://elevenlabs.io/conversational-ai",[79],[75,2938,2941],{"href":2939,"rel":2940},"https://www.intercom.com/drlp/ai-agent",[79],"Intercom",". Serviced agents hide LLM communication and potentially interaction with a web browser behind the configuration interface.",[14,2944,2945,2946,2949,2950,2955,2956,2961,2962,2967],{},"Using a ",[18,2947,2948],{},"templated agent"," resembles the agent-as-a-service approach on a lower level. Openly sourced from a ",[75,2951,2954],{"href":2952,"rel":2953},"https://github.com/webfuse-com/agent-extension-blueprint",[79],"code repository",", templated agents allow for any kind of development tweaks. Favourably, agent templates shortcut integration with ",[75,2957,2960],{"href":2958,"rel":2959},"https://openai.com/api/",[79],"LLM APIs"," and web ",[75,2963,2966],{"href":2964,"rel":2965},"https://developer.mozilla.org/en-US/docs/Web/API",[79],"browser APIs",". Using a templated agent usually represents the preferable, best-of-both-worlds approach; common- and best-practice code snippets are available from the beginning, but everything can be customised as desired.",[14,2969,2970,2971,2974],{},"Of course, developing an ",[18,2972,2973],{},"agent from scratch"," is always an option. It is preferable whenever agent requirements deviate to a large extent from what exists in the service or template landscape.",[118,2976,2978],{"id":2977},"deploy-a-web-agent","Deploy a Web Agent",[14,2980,2981,2982,821,2987,2992,2993,2998,2999,3004,3005,3010,3011,3016],{},"When web agent code lives side-by-side with the augmented application's code, agent deployment is covered by a generic pipeline. Something like: ",[75,2983,2986],{"href":2984,"rel":2985},"https://eslint.org",[79],"linting",[75,2988,2991],{"href":2989,"rel":2990},"https://prettier.io",[79],"formatting"," agent code, ",[75,2994,2997],{"href":2995,"rel":2996},"https://esbuild.github.io",[79],"transpiling and bundling"," agent modules, ",[75,3000,3003],{"href":3001,"rel":3002},"https://www.cypress.io",[79],"testing"," agent, ",[75,3006,3009],{"href":3007,"rel":3008},"https://pages.cloudflare.com",[79],"hosting"," agent bundle, and ",[75,3012,3015],{"href":3013,"rel":3014},"https://docs.github.com/en/actions/get-started/continuous-integration",[79],"tiggering"," post deployment events. In that case, an agent represents a modular feature component in the application, no different than, for instance, a sign-up component.",[14,3018,3019],{},"Web agent source code right inside the application codebase comes at a cost:",[27,3021,3022,3025,3028],{},[30,3023,3024],{},"Agent developers can manipulate the source code of the underlying application.",[30,3026,3027],{},"Agent functionality could introduce side effects on the underlying application.",[30,3029,3030],{},"Agent changes require deployment of the entire application.",[118,3032,3034],{"id":3033},"best-practices-of-agentic-ux","Best Practices of Agentic UX",[14,3036,3037],{},"When designing user experiences for agent-enhanced applications, there are a few things to consider:",[27,3039,3040,3041,3040,3050,3040,3058],{},"\n    ",[30,3042,3043,3044,3043,3047,3049],{},"\n        ",[18,3045,3046],{},"Stream input and output to reduce latency",[2262,3048],{},"\n        LLMs (re-)introduce noticeable communication round-trip time. To reduce wait time for the human user, stream chunks of data whenever they are available.\n    ",[30,3051,3043,3052,3043,3055,3057],{},[18,3053,3054],{},"Provide fine-grained feedback to bridge high-latency",[2262,3056],{},"\n        Human attention is sensitive to several seconds of [system response time](https://www.nngroup.com/articles/response-times-3-important-limits/). Periodically provide agent _thoughts_ as feedback to perceptibly break down round-trip time.\n    ",[30,3059,3043,3060,3043,3063,3065],{},[18,3061,3062],{},"Always prompt the human user for consent to perform critical actions",[2262,3064],{},"\n        Some actions in a web application lead to irreversible or significant changes of state. Never have the agent perform such actions on behalf of the user without explicitly asking for the permission.\n    ",[118,3067,3069],{"id":3068},"non-invasive-web-agents-with-webfuse","Non-Invasive Web Agents with Webfuse",[14,3071,3072,3077],{},[75,3073,3075],{"href":1951,"rel":3074},[79],[18,3076,1953],{}," is a configurable web proxy that lets you augment any web application. As pictured, web agents represent highly self-contained applications. Moreover, web agents and underlying applications communicate at runtime in the client. This does, in fact, render opportunities to bridge the above-mentioned drawbacks with Webfuse: Develop web agents with a sandbox extension methodology, and deploy them through the low-latency proxy layer. On demand, seamlessly serve users with your agent-enhanced website. Benefit from information hiding, safe code, and fewer deployments.",[83,3079],{":demoAction":3080,"heading":3081,"subtitle":3082},"{\"text\":\"Read more\",\"showIcon\":false,\"href\":\"https://www.webfuse.com/blog/category/ai-agents\"}","Deploy Web Agents with Webfuse","Develop or deploy web agents in minutes; serve agent-enhanced websites through an isolated application layer.",[2336,3084,3085],{},"html pre.shiki code .scGhl, html code.shiki .scGhl{--shiki-default:#7C7F93;--shiki-dark:#D6DEEB}html pre.shiki code .srFR9, html code.shiki .srFR9{--shiki-default:#7C7F93;--shiki-dark:#7FDBCA}html pre.shiki code .s30W1, html code.shiki .s30W1{--shiki-default:#1E66F5;--shiki-dark:#7FDBCA}html pre.shiki code .sbuKk, html code.shiki .sbuKk{--shiki-default:#40A02B;--shiki-dark:#D9F5DD}html pre.shiki code .sCC8C, html code.shiki .sCC8C{--shiki-default:#40A02B;--shiki-dark:#C789D6}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s76yb, html code.shiki .s76yb{--shiki-default:#8839EF;--shiki-dark:#C792EA}html pre.shiki code .sXbZB, html code.shiki .sXbZB{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .s2kId, html code.shiki .s2kId{--shiki-default:#4C4F69;--shiki-dark:#D6DEEB}html pre.shiki code .s9rnR, html code.shiki .s9rnR{--shiki-default:#179299;--shiki-dark:#7FDBCA}html pre.shiki code .scrte, html code.shiki .scrte{--shiki-default:#8839EF;--shiki-dark:#C5E478}html pre.shiki code .sgAC-, html code.shiki .sgAC-{--shiki-default:#40A02B;--shiki-default-font-style:italic;--shiki-dark:#ECC48D;--shiki-dark-font-style:inherit}",{"title":543,"searchDepth":544,"depth":544,"links":3087},[3088,3093,3099],{"id":2397,"depth":544,"text":2363,"children":3089},[3090,3091,3092],{"id":2413,"depth":549,"text":2414},{"id":2435,"depth":549,"text":2436},{"id":2453,"depth":549,"text":2454},{"id":2474,"depth":544,"text":2475,"children":3094},[3095,3096,3097,3098],{"id":2486,"depth":549,"text":2487},{"id":2519,"depth":549,"text":2520},{"id":2541,"depth":549,"text":2542},{"id":2905,"depth":549,"text":2906},{"id":2917,"depth":544,"text":2918,"children":3100},[3101,3102,3103,3104],{"id":2924,"depth":549,"text":2925},{"id":2977,"depth":549,"text":2978},{"id":3033,"depth":549,"text":3034},{"id":3068,"depth":549,"text":3069},"2025-06-15","LLMs only recently enabled serviceable web agents: autonomous systems that browse web on behalf of a human. Get started with fundamental methodology, key design challenges, and technological opportunities.",{"homepage":578,"relatedLinks":3108},[3109,3110,3114],{"text":2359,"href":2360,"description":2361},{"text":3111,"href":3112,"description":3113},"Develop an AI Agent for Any Website with Webfuse","/blog/develop-an-ai-agent-for-any-website-with-webfuse","Learn how to develop and deploy a web agent for any website with Webfuse",{"text":1947,"href":3115,"external":578,"description":2367},"https://dev.webfuse.com/automation-api/",{"title":2378,"description":3106},{"loc":1688},"blog/1011.a-gentle-introduction-to-ai-agents-for-the-web",[584,585,2373,587,588],"NE1cc8w1586RjefKyr028dgV7yBmf460jhZy91LninA",1776351442183]