[{"data":1,"prerenderedAt":3350},["ShallowReactive",2],{"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites":3,"related-/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites":820},{"id":4,"title":5,"authorId":6,"body":7,"category":793,"created":794,"description":795,"extension":796,"faqs":797,"featurePriority":797,"head":797,"landingPath":757,"meta":798,"navigation":799,"ogImage":797,"path":809,"robots":797,"schemaOrg":797,"seo":810,"sitemap":811,"stem":812,"tags":813,"__hash__":819},"blog/blog/1026.challenges-of-building-reliable-voice-ai-agents-on-live-websites.md","Challenges of Building Reliable Voice AI Agents on Live Websites","salome-koshadze",{"type":8,"value":9,"toc":759},"minimark",[10,18,22,25,28,47,50,55,58,70,73,78,81,118,121,125,128,131,135,142,146,153,157,160,163,166,170,173,205,208,212,219,222,246,249,253,256,259,262,266,269,280,283,287,294,320,323,326,330,333,336,339,343,346,369,372,386,390,393,430,433,437,440,443,447,450,476,480,483,497,501,504,524,527,531,534,537,543,547,550,553,575,579,582,614,617,621,624,631,634,723,726,730,733,736],[11,12],"nuxt-picture",{":height":13,":width":14,"format":15,"loading":16,"src":17},"768","1344","webp","lazy","/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/1.png",[19,20,21],"p",{},"Voice agents are becoming a more common part of how users interact with websites. They provide a conversational interface that can guide users, answer questions, and perform tasks, moving interactions beyond traditional clicks and typed commands. The goal is an AI that can talk with a user and assist them directly on the page they are viewing.",[19,23,24],{},"However, transforming a conversational bot into a capable agent that can reliably act upon a live, dynamic web application introduces a series of technical challenges. When a voice agent is implemented by directly injecting its script into a website, its effectiveness is often limited by the very nature of the modern web. The agent’s ability to perform actions, maintain context, and operate securely is frequently compromised.",[19,26,27],{},"This article breaks down the specific technical problems that developers face with this direct injection method. We will examine the issues that prevent a standard voice agent from delivering a consistent and genuinely helpful user experience, including:",[29,30,31,35,38,41,44],"ul",{},[32,33,34],"li",{},"The unreliability of actions based on direct DOM manipulation.",[32,36,37],{},"The constant loss of conversational state during page navigations.",[32,39,40],{},"Browser security policies that block cross-domain control and iframe access.",[32,42,43],{},"The agent's difficulty in accurately perceiving complex page content.",[32,45,46],{},"Gaps in deployment, security, and the ability to audit sessions.",[19,48,49],{},"By analyzing these challenges, we can establish the requirements for a more suitable architecture-one that allows a voice agent to operate effectively on any website.",[51,52,54],"h2",{"id":53},"why-direct-dom-manipulation-creates-unreliable-agents","Why Direct DOM Manipulation Creates Unreliable Agents",[11,56],{":height":13,":width":14,"format":15,"loading":16,"src":57},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/2.png",[19,59,60,61,65,66,69],{},"When a voice agent is injected directly into a website, its primary method for taking action is through direct manipulation of the Document Object Model (DOM). The process is straightforward in theory: the agent’s script uses JavaScript to find specific HTML elements on the page and then programmatically triggers events like clicks or keyboard inputs. To locate these elements, the agent relies on CSS selectors-unique identifiers like ",[62,63,64],"code",{},"id=\"submit-button\""," or ",[62,67,68],{},"class=\"username-field\"",".",[19,71,72],{},"This approach, however, is the source of major reliability issues. The agent's ability to act is tied directly to a static map of the website's structure, but modern websites are anything but static.",[74,75,77],"h3",{"id":76},"the-fragility-of-css-selectors","The Fragility of CSS Selectors",[19,79,80],{},"A website's user interface is in a constant state of flux. Developers continuously update layouts, redesign components, run A/B tests, or refactor code. These changes, no matter how small, often alter the underlying HTML structure.",[29,82,83,98,112],{},[32,84,85,89,90,93,94,97],{},[86,87,88],"strong",{},"Structural Changes:"," A button might be moved inside a new ",[62,91,92],{},"\u003Cdiv>",", or a form field's ",[62,95,96],{},"id"," might be updated for better clarity.",[32,99,100,103,104,107,108,111],{},[86,101,102],{},"Styling Updates:"," A class name used for styling, like ",[62,105,106],{},".btn-primary",", could be changed to ",[62,109,110],{},".btn-submit",", breaking any selector that relied on the old name.",[32,113,114,117],{},[86,115,116],{},"Framework-Generated IDs:"," Some web frameworks automatically generate dynamic, non-human-readable IDs and class names that can change with every new build of the application.",[19,119,120],{},"When these changes occur, the CSS selectors hardcoded into the agent's logic become invalid. The agent attempts to find an element that no longer exists at the expected location, causing the action to fail. This brittleness means the agent requires constant maintenance to keep pace with website updates, and any unmonitored change can disable its functionality without warning.",[74,122,124],{"id":123},"the-problem-of-dynamic-content","The Problem of Dynamic Content",[19,126,127],{},"Modern web applications rarely load all their content at once. To improve performance, content is often loaded asynchronously with JavaScript after the initial page has been rendered. This includes product listings on an e-commerce site, user data in a dashboard, or search results that appear after a query is submitted.",[19,129,130],{},"This creates a timing problem, or a \"race condition,\" for a directly injected agent. The agent’s script might try to interact with an element before it has been created and added to the DOM. For example, it might try to click a \"Buy Now\" button that has not yet loaded. The result is another failure. While developers can attempt to build workarounds using timers or mutation observers to wait for elements to appear, this adds another layer of complexity and creates a solution that is just as likely to break when the application's loading behavior changes.",[74,132,134],{"id":133},"the-invisibility-of-encapsulated-components","The Invisibility of Encapsulated Components",[19,136,137,138,141],{},"Many modern web frameworks use the ",[86,139,140],{},"Shadow DOM"," to encapsulate components, keeping their internal structure separate from the main page's DOM. This is a common practice for building complex UI elements like date pickers, custom form controls, or product configurators. For a standard script injected into the page, however, elements inside a shadow root are invisible and inaccessible. The agent cannot \"see into\" these components to find buttons or input fields, making it impossible to interact with major parts of the user interface.",[74,143,145],{"id":144},"the-challenge-of-synthetic-events","The Challenge of Synthetic Events",[19,147,148,149,152],{},"Even when an agent successfully locates an element, the action itself can fail. Programmatically triggered events (like a script calling ",[62,150,151],{},"element.click()",") are not identical to events generated by a real user. Many web applications have logic that can detect this difference. This can lead to compatibility issues where the website's own scripts do not respond to the synthetic event, or it can trigger security measures designed to block automated bots. The agent’s action is either ignored or actively blocked, creating another point of failure.",[51,154,156],{"id":155},"maintaining-conversational-state-across-page-navigations","Maintaining Conversational State Across Page Navigations",[11,158],{":height":13,":width":14,"format":15,"loading":16,"src":159},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/3.png",[19,161,162],{},"A major challenge for a voice agent implemented via direct script injection is its inability to maintain a continuous conversation as a user navigates a website. The standard behavior of a web browser is to treat each page load as a separate, isolated event. This model is fundamentally at odds with the requirements of a persistent, stateful AI assistant.",[19,164,165],{},"When a user follows a link to a new page or submits a form that causes a page refresh, the browser discards the current page in its entirety. This includes all the HTML, CSS, and any JavaScript that was running. The voice agent’s script, being part of that page, is terminated along with everything else.",[74,167,169],{"id":168},"the-impact-of-a-stateless-environment","The Impact of a Stateless Environment",[19,171,172],{},"This page-by-page lifecycle leads to a complete breakdown of the user experience for any task that spans more than a single page.",[29,174,175,181,187],{},[32,176,177,180],{},[86,178,179],{},"Abrupt End to Conversations:"," The moment a user navigates away, the agent's memory is wiped. The ongoing conversation is cut off, and any context that had been established is lost.",[32,182,183,186],{},[86,184,185],{},"Forced Repetition:"," On the new page, the agent's script must reload from scratch. The user is forced to re-initiate the conversation and repeat their request. The agent has no recollection of the previous interaction.",[32,188,189,192,193],{},[86,190,191],{},"Inability to Handle Multi-Step Tasks:"," This limitation makes it impossible for the agent to assist with complex workflows. Consider a user booking a flight:",[194,195,196,199,202],"ol",{},[32,197,198],{},"On the homepage, the user says, \"Find me a flight to New York.\" The agent helps fill out the form.",[32,200,201],{},"The user is taken to a search results page.",[32,203,204],{},"The user then says, \"Now, sort the results by price.\"",[19,206,207],{},"A directly injected agent on the results page would have no memory of the original destination request. It cannot connect the new command to the previous context.",[74,209,211],{"id":210},"why-custom-workarounds-fall-short","Why Custom Workarounds Fall Short",[19,213,214,215,218],{},"To solve this, developers might attempt to build a custom system to preserve state across navigations. This typically requires creating a separate ",[86,216,217],{},"orchestration service"," that uses session cookies to identify the user and a backend to store the conversation history. The idea is to save the state before the page unloads and restore it when the new page loads.",[19,220,221],{},"However, this approach is more of a patch than a solution and introduces its own problems:",[29,223,224,230,240],{},[32,225,226,229],{},[86,227,228],{},"Engineering Overhead:"," Building and maintaining a reliable state-tracking service is a complex project in itself, adding another moving part that can fail.",[32,231,232,235,236,239],{},[86,233,234],{},"Unreliable State Saving:"," Capturing the full state of the agent in the moments before a page is destroyed is difficult. Browser events like ",[62,237,238],{},"beforeunload"," do not provide a guaranteed window for these operations to complete successfully.",[32,241,242,245],{},[86,243,244],{},"Noticeable Disruption:"," The process of loading a new page, running the agent script, fetching the saved state from a service, and re-rendering the conversation UI is not instantaneous. This results in a disjointed user experience with visible delays and content \"flashing\" as the state is pieced back together.",[19,247,248],{},"The core issue remains: the agent's existence is tied to the lifecycle of an individual page. A truly interactive assistant requires a persistent operational layer that exists independently of page loads. Direct script injection cannot provide this, making it unsuitable for building agents that can guide users through a complete journey on a website.",[51,250,252],{"id":251},"navigating-cross-domain-restrictions-and-iframes","Navigating Cross-Domain Restrictions and Iframes",[11,254],{":height":13,":width":14,"format":15,"loading":16,"src":255},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/4.png",[19,257,258],{},"A core security principle of the web, the Same-Origin Policy, prevents a script from one website from accessing or manipulating content on another. This policy is a major safeguard, stopping malicious sites from stealing data from other tabs you may have open. However, for a voice agent implemented via direct script injection, this security measure creates an impassable barrier, severely limiting its operational range.",[19,260,261],{},"The agent's script is bound to the domain where it was initially loaded. This confinement results in two major operational failures when a user's task involves more than one domain.",[74,263,265],{"id":264},"the-wall-between-websites","The Wall Between Websites",[19,267,268],{},"Modern web workflows are rarely self-contained. A user journey frequently involves moving between different, yet related, websites. Consider these common scenarios:",[29,270,271,274,277],{},[32,272,273],{},"An e-commerce transaction that redirects the user from the merchant's site to a third-party payment processor like PayPal.",[32,275,276],{},"A travel booking process that sends the user to a partner airline's website to select seats.",[32,278,279],{},"A single sign-on (SSO) flow that redirects to an identity provider like Okta or Google for authentication.",[19,281,282],{},"In each of these cases, a directly injected voice agent's journey comes to an abrupt halt. The moment the user is redirected to the new domain, the agent, whose code lives on the original site, cannot follow. It is left behind. The user loses their assistant at what is often the most complex part of the process, completely breaking the continuity of the guided experience.",[74,284,286],{"id":285},"black-holes-on-the-page-the-iframe-problem","Black Holes on the Page: The Iframe Problem",[19,288,289,290,293],{},"The same security restrictions that block access to third-party sites also apply to content embedded within ",[62,291,292],{},"\u003Ciframe>"," elements. Iframes are a common way to display content from another domain directly on a page without forcing a full redirect. This is used for many essential web components:",[29,295,296,302,308,314],{},[32,297,298,301],{},[86,299,300],{},"Payment Forms:"," Secure credit card input fields from services like Stripe or Braintree.",[32,303,304,307],{},[86,305,306],{},"Customer Support Portals:"," Embedded chat widgets or help centers from platforms like Intercom or Zendesk.",[32,309,310,313],{},[86,311,312],{},"Media Players:"," Videos from YouTube or Vimeo.",[32,315,316,319],{},[86,317,318],{},"Embedded Widgets:"," Cookie consent banners, A/B testing tools, or analytics solutions often loaded through Google Tag Manager.",[19,321,322],{},"From the perspective of the voice agent running on the main page, the content inside a cross-domain iframe is an opaque box. The Same-Origin Policy denies the agent's script any ability to \"see\" inside the iframe. It cannot read the text, identify the buttons, or access the form fields contained within it. This creates a functional blind spot on the page. The agent is powerless to help a user fill out their payment details, interact with a support agent, or even accept a cookie policy to clear the screen.",[19,324,325],{},"These cross-domain limitations mean that a directly injected agent is effectively trapped within a single, isolated garden. It cannot navigate the interconnected ecosystem of the modern web, rendering it incapable of assisting with the integrated, multi-service workflows that users engage in every day.",[51,327,329],{"id":328},"enabling-agents-to-accurately-perceive-web-content","Enabling Agents to Accurately Perceive Web Content",[11,331],{":height":13,":width":14,"format":15,"loading":16,"src":332},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/5.png",[19,334,335],{},"For a voice agent to provide relevant assistance, it must first understand the page the user is viewing. A human user perceives a website visually-they see headlines, buttons, and images arranged in a clear layout. An AI agent, however, perceives the site by processing its underlying code, the Document Object Model (DOM).",[19,337,338],{},"When an agent is implemented with a standard script, it has direct access to this DOM. However, the raw DOM of a modern web application is a poor source of information. It is often a complex and noisy environment that makes it very difficult for an AI to extract meaningful context.",[74,340,342],{"id":341},"the-challenge-of-dom-complexity-and-noise","The Challenge of DOM Complexity and Noise",[19,344,345],{},"The HTML structure of a contemporary website is rarely a clean representation of its content. It is typically cluttered with code that serves purposes other than displaying information, creating major problems for an AI trying to make sense of it.",[29,347,348,357,363],{},[32,349,350,353,354,356],{},[86,351,352],{},"Structural Overhead:"," Websites are built with deeply nested ",[62,355,92],{}," elements used for styling and layout, creating a complex tree structure that obscures the actual content.",[32,358,359,362],{},[86,360,361],{},"Framework-Generated Code:"," Front-end frameworks like React or Angular often generate non-semantic class names and complex component structures that are difficult for an AI to interpret.",[32,364,365,368],{},[86,366,367],{},"Third-Party Scripts and Hidden Elements:"," The DOM is also filled with tracking pixels, ad-related scripts, and elements that are hidden from the user but still present in the code.",[19,370,371],{},"This complexity presents two issues for an AI agent:",[194,373,374,380],{},[32,375,376,379],{},[86,377,378],{},"Information Overload:"," The sheer volume of raw HTML can easily exceed the context window of the large language model powering the agent. Feeding it thousands of lines of code makes it slow and expensive to process.",[32,381,382,385],{},[86,383,384],{},"Signal vs. Noise:"," The agent struggles to distinguish between important content (like a product description or a form field) and irrelevant structural code or hidden tracking elements.",[74,387,389],{"id":388},"blindness-to-technical-and-visual-context","Blindness to Technical and Visual Context",[19,391,392],{},"The raw DOM also fails to capture the visual context and interactive state of a webpage, which are essential for providing accurate assistance. This is made worse by the technical barriers discussed earlier.",[29,394,395,401,407,420],{},[32,396,397,400],{},[86,398,399],{},"No Visual Hierarchy:"," The DOM is a structural tree, not a visual one. It does not tell the agent what is visually prominent on the page. A large, important headline and a small piece of text in the footer might appear similar in the HTML structure.",[32,402,403,406],{},[86,404,405],{},"Lack of State Awareness:"," The agent is unaware of the page's interactive state. It cannot tell if a modal pop-up is currently covering the screen, if a dropdown menu is open, or if a specific tab in a component is active. This can cause it to attempt actions on elements that are not currently visible or interactive.",[32,408,409,412,413,416,417,419],{},[86,410,411],{},"Invisibility of Encapsulated and Embedded Content:"," The agent’s perception is further compromised because it cannot read content inside cross-domain ",[86,414,415],{},"iframes"," or UI components built with ",[86,418,140],{},". This means large, interactive parts of the page are not just hard to interpret; they are entirely missing from the agent's understanding of the page.",[32,421,422,425,426,429],{},[86,423,424],{},"Invisibility of Canvas Content:"," Some web content, like interactive charts or product configurators, is rendered inside a ",[62,427,428],{},"\u003Ccanvas>"," element. The DOM provides no information about what is drawn inside a canvas, making this content completely invisible to the agent.",[19,431,432],{},"Without a specialized mechanism to process, clean, and structure this information, a directly injected agent operates with poor eyesight. It is given a massive, noisy blueprint of the page and is left to guess what is important, what is visible, and what is actionable. This limitation severely restricts its ability to make reliable decisions and provide genuinely helpful guidance.",[51,434,436],{"id":435},"addressing-deployment-security-and-auditing-gaps","Addressing Deployment, Security, and Auditing Gaps",[11,438],{":height":13,":width":14,"format":15,"loading":16,"src":439},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/6.png",[19,441,442],{},"The technical challenges of a directly injected voice agent extend beyond its in-session performance. The entire lifecycle of deploying, managing, and securing the agent presents a separate set of operational problems. When the agent is just another script within a website's codebase, it inherits all the complexities of standard software development and lacks the specialized features needed for a secure, enterprise-ready solution.",[74,444,446],{"id":445},"deployment-and-maintenance-complexity","Deployment and Maintenance Complexity",[19,448,449],{},"A direct implementation model tightly couples the voice agent to the host website's source code. This creates a rigid and inefficient management process.",[29,451,452,458,464,470],{},[32,453,454,457],{},[86,455,456],{},"Requires Code Changes:"," To install, update, or reconfigure the agent, a developer must directly edit the website's files. This is not a simple task; it involves developer resources, code reviews, and adherence to the website’s release schedule.",[32,459,460,463],{},[86,461,462],{},"Slow Update Cycles:"," This dependency on development cycles means that a simple change-like updating the agent's system prompt or adding a new tool-can take days or weeks to go live. There is no agility to quickly test new configurations or respond to an issue.",[32,465,466,469],{},[86,467,468],{},"Organizational Separation:"," Often, the team developing the agent is separate from the team managing the website and may not have direct access to the application's source code. This forces a rigid development process where agent updates are dependent on another team’s priorities and release schedule.",[32,471,472,475],{},[86,473,474],{},"Lack of Support for Unmodifiable Applications:"," This model fails entirely with applications that cannot be changed, such as legacy systems no longer under active development or third-party vendor platforms (e.g., payment processors, e-signature providers). It is not possible to inject a script into an application you do not control.",[74,477,479],{"id":478},"data-security-and-compliance-risks","Data Security and Compliance Risks",[19,481,482],{},"When a script runs on a webpage, it has access to any information displayed on that page. This creates a major security concern for a voice agent, which might be active on pages containing sensitive data.",[29,484,485,491],{},[32,486,487,490],{},[86,488,489],{},"Uncontrolled Data Exposure:"," The agent can access personally identifiable information (PII), financial details, or protected health information (PHI) that is visible on the screen.",[32,492,493,496],{},[86,494,495],{},"No Built-in Masking:"," Without a separate architectural layer to control data flow, there is no easy way to redact or mask this sensitive information before it is processed by the agent’s AI model or captured in a session recording. This creates a sizeable risk for compliance with regulations like GDPR, HIPAA, and PCI DSS. The organization becomes responsible for any sensitive data the AI is exposed to.",[74,498,500],{"id":499},"the-absence-of-collaborative-and-auditing-features","The Absence of Collaborative and Auditing Features",[19,502,503],{},"A standard voice agent script offers an experience for a single user and lacks the necessary framework for oversight or human assistance.",[29,505,506,512,518],{},[32,507,508,511],{},[86,509,510],{},"No Human Escalation Path:"," The system has no native capabilities for co-browsing or shared control, which would allow a human agent to join the session, see the user's screen, and take over to provide direct assistance. When the AI reaches its limits, the only option is to end the session and direct the user to a separate support channel, creating a frustrating experience.",[32,513,514,517],{},[86,515,516],{},"No Automatic Audit Trails:"," A direct implementation provides no built-in mechanism for generating a detailed log of the agent's actions and the user's interactions. This makes it very difficult to review a session to debug an issue, understand what happened, or verify compliance.",[32,519,520,523],{},[86,521,522],{},"No Session Recording:"," There is no easy way to create a pixel-perfect video recording of the session. Recordings are important for quality assurance, training human agents, and providing a definitive record of the interaction.",[19,525,526],{},"Without these features, managers and compliance officers have no visibility into how the agent is performing or what is occurring in user sessions, making it difficult to ensure quality or investigate a security incident.",[51,528,530],{"id":529},"shifting-from-direct-injection-to-a-virtualization-layer","Shifting from Direct Injection to a Virtualization Layer",[11,532],{":height":13,":width":14,"format":15,"loading":16,"src":533},"/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites/7.png",[19,535,536],{},"The challenges outlined-unreliable actions, lost conversational state, cross-domain restrictions, poor perception, and operational gaps-all stem from a single architectural choice: embedding the voice agent directly into the target website. This approach forces the agent to operate within the constraints of a standard browser environment, which was never designed for this kind of persistent, interactive application. The limitations are not just individual problems to be patched; they are symptoms of a foundational mismatch between the goal and the method.",[19,538,539,540,69],{},"To build a reliable and capable voice agent, a different architectural model is required. The solution is not to create a more complex script but to change the very environment in which the agent operates. This is accomplished by moving from direct injection to a ",[86,541,542],{},"virtualization layer",[74,544,546],{"id":545},"the-concept-of-a-web-virtualization-layer","The Concept of a Web Virtualization Layer",[19,548,549],{},"A virtualization layer is an intermediary system that sits between the user's browser and the target web application. Instead of the user accessing the website directly, they connect to a platform that, in turn, manages the interaction with the site on their behalf. This platform creates a secure, controlled sandbox around the original website, effectively wrapping it in an interactive overlay.",[19,551,552],{},"This is achieved through a proxy-based system. When a user initiates a session, the platform's proxy fetches the website's content (HTML, CSS, JavaScript) and dynamically modifies it in real-time before sending it to the user's browser. This on-the-fly transformation allows the platform to:",[29,554,555,562,568],{},[32,556,557,558,561],{},"Establish a ",[86,559,560],{},"persistent session environment"," that exists independently of any single page load.",[32,563,564,565,69],{},"Rewrite URLs and modify security headers to bring multiple websites and iframes into a ",[86,566,567],{},"single, unified context",[32,569,570,571,574],{},"Inject a specialized set of ",[86,572,573],{},"high-level tools and APIs"," that give the agent reliable methods for perception and action.",[74,576,578],{"id":577},"how-virtualization-addresses-the-core-problems","How Virtualization Addresses the Core Problems",[19,580,581],{},"By changing the operational environment, a virtualization layer provides direct solutions to the problems inherent in the direct injection model.",[29,583,584,590,596,602,608],{},[32,585,586,589],{},[86,587,588],{},"Reliability:"," The agent no longer relies on fragile CSS selectors. It uses a stable, high-level API to interact with the page, abstracting away the complexities of the underlying DOM, including the Shadow DOM.",[32,591,592,595],{},[86,593,594],{},"State Persistence:"," The agent runs within the persistent virtualization layer, not on the individual page. As the user navigates, the agent and its conversational context remain active and unbroken.",[32,597,598,601],{},[86,599,600],{},"Cross-Domain Control:"," The proxy brings all navigated websites, including third-party services and iframes, into the same virtualized session. The Same-Origin Policy is no longer a barrier because, from the browser's perspective, all content is being served from the platform's single, trusted domain.",[32,603,604,607],{},[86,605,606],{},"Accurate Perception:"," The platform provides tools that process the raw DOM into a clean, structured format optimized for an AI, solving the problem of information overload and lack of visual context.",[32,609,610,613],{},[86,611,612],{},"Centralized Management:"," Deployment, security, and auditing are handled by the platform. Updates can be rolled out instantly without code changes, data can be masked before it reaches the agent, and detailed session logs can be generated automatically.",[19,615,616],{},"This architectural shift moves the agent from being a temporary guest on a webpage to a permanent resident of a controlled, purpose-built environment. It is this virtualization approach that provides the foundation for building a truly effective and enterprise-ready voice AI.",[51,618,620],{"id":619},"solving-agent-reliability-with-a-web-augmentation-platform","Solving Agent Reliability with a Web Augmentation Platform",[19,622,623],{},"The limitations of a directly injected voice agent are not solvable by simply improving the agent's script. The core issue lies in asking the agent to perform a complex, stateful job within a stateless and restrictive environment. The solution, therefore, is not to force the agent to adapt to the web's constraints, but to provide a new operational layer that removes those constraints entirely. This is the function of a web augmentation platform.",[19,625,626,627,630],{},"A web augmentation platform, like Webfuse, operates on the principle of virtualization. Instead of placing the agent's code directly onto the target website, the platform wraps the entire user experience in a ",[86,628,629],{},"Virtual Web Session",". This session is a controlled, persistent environment managed by the platform's proxy, which gives the agent the stable foundation it needs to operate reliably.",[19,632,633],{},"The following table summarizes the architectural differences and their impact on performance:",[635,636,637,654],"table",{},[638,639,640],"thead",{},[641,642,643,648,651],"tr",{},[644,645,647],"th",{"align":646},"left","Challenge",[644,649,650],{"align":646},"Direct Injection Method",[644,652,653],{"align":646},"Webfuse",[655,656,657,671,684,697,710],"tbody",{},[641,658,659,665,668],{},[660,661,662],"td",{"align":646},[86,663,664],{},"Action Reliability",[660,666,667],{"align":646},"Relies on fragile CSS selectors that break with website updates. Cannot interact with Shadow DOM or generate user-like events.",[660,669,670],{"align":646},"Uses a stable Automation API. Actions are resilient to UI changes and can reliably interact with all page elements, including the Shadow DOM.",[641,672,673,678,681],{},[660,674,675],{"align":646},[86,676,677],{},"Session Continuity",[660,679,680],{"align":646},"Agent script is terminated on each page load, losing all context. Requires complex, unreliable workarounds.",[660,682,683],{"align":646},"The agent operates in a persistent virtualization layer, independent of page loads. The conversation remains continuous throughout the user's entire journey.",[641,685,686,691,694],{},[660,687,688],{"align":646},[86,689,690],{},"Cross-Domain Control",[660,692,693],{"align":646},"Blocked by the browser's Same-Origin Policy. Cannot access or control content in third-party sites or embedded iframes.",[660,695,696],{"align":646},"Unifies all domains and iframes into a single session. The agent can operate across any website the user visits.",[641,698,699,704,707],{},[660,700,701],{"align":646},[86,702,703],{},"Content Perception",[660,705,706],{"align":646},"Processes a raw, noisy DOM, leading to information overload and inaccurate context. Blind to iframe and Shadow DOM content.",[660,708,709],{"align":646},"Provides specialized tools that transform the DOM into a clean, structured format for the AI, ensuring accurate perception.",[641,711,712,717,720],{},[660,713,714],{"align":646},[86,715,716],{},"Deployment & Security",[660,718,719],{"align":646},"Requires code changes for updates, tied to website release cycles. Lacks built-in data masking or comprehensive auditing.",[660,721,722],{"align":646},"Centralized, no-code deployment. Includes built-in security features like data masking and provides automatic session recording and audit logs.",[19,724,725],{},"By shifting from direct injection to a virtualization platform, the voice agent is transformed from an unreliable script into an integrated, enterprise-ready assistant. This architecture provides the stability and security that allows the agent to perceive, act, and persist across the entire web, finally enabling the creation of a genuinely helpful and reliable automated experience.",[51,727,729],{"id":728},"take-the-next-step","Take the Next Step",[19,731,732],{},"The challenges of building reliable voice agents are significant, but they are solvable with the right architecture. A web augmentation platform provides the persistent, secure, and capable environment needed to move beyond fragile scripts and create genuinely helpful user experiences.",[19,734,735],{},"If you are ready to build AI agents that can navigate the entire web, handle complex tasks, and operate securely, our team can help you design the right solution.",[29,737,738,750],{},[32,739,740,743,744,749],{},[86,741,742],{},"Talk to an Expert:"," ",[745,746,748],"a",{"href":747},"/demo","Contact us"," to discuss your specific use case and learn how a virtualization layer can address your agent reliability challenges.",[32,751,752,743,755],{},[86,753,754],{},"Explore Use Cases:",[745,756,758],{"href":757},"/use-case/voice-agents","Learn more about how Webfuse is used to create automated journeys across different websites.",{"title":760,"searchDepth":761,"depth":761,"links":762},"",2,[763,770,774,778,782,787,791,792],{"id":53,"depth":761,"text":54,"children":764},[765,767,768,769],{"id":76,"depth":766,"text":77},3,{"id":123,"depth":766,"text":124},{"id":133,"depth":766,"text":134},{"id":144,"depth":766,"text":145},{"id":155,"depth":761,"text":156,"children":771},[772,773],{"id":168,"depth":766,"text":169},{"id":210,"depth":766,"text":211},{"id":251,"depth":761,"text":252,"children":775},[776,777],{"id":264,"depth":766,"text":265},{"id":285,"depth":766,"text":286},{"id":328,"depth":761,"text":329,"children":779},[780,781],{"id":341,"depth":766,"text":342},{"id":388,"depth":766,"text":389},{"id":435,"depth":761,"text":436,"children":783},[784,785,786],{"id":445,"depth":766,"text":446},{"id":478,"depth":766,"text":479},{"id":499,"depth":766,"text":500},{"id":529,"depth":761,"text":530,"children":788},[789,790],{"id":545,"depth":766,"text":546},{"id":577,"depth":766,"text":578},{"id":619,"depth":761,"text":620},{"id":728,"depth":761,"text":729},"voice-ai","2025-11-13","Why voice agents break on real websites. Direct script injection can't handle changing DOM selectors, page reloads, cross-domain iframes, or complex content. Virtualization fixes these problems.","md",null,{"homepage":799,"relatedLinks":800},true,[801,804,807],{"text":802,"href":803},"Building a Voice Agent Than Can Act","/blog/building-a-website-controlling-voice-agent-with-elevenlabs-and-webfuse",{"text":805,"href":806},"Top 5 Voice AI Agents for Website Integration","/blog/top-5-voice-ai-agents-for-website-integration-in-2026",{"text":808,"href":757},"Universal Voice Agent Deployment","/blog/challenges-of-building-reliable-voice-ai-agents-on-live-websites",{"title":5,"description":795},{"loc":809},"blog/1026.challenges-of-building-reliable-voice-ai-agents-on-live-websites",[814,815,816,817,818],"ai-agents","browser-agents","voice-agents","web-agents","web-automation","y4iyLTA-0YEbokSTjFwWlRb3KlcNZZ3m0PReFmrcczM",[821,2603],{"id":822,"title":823,"authorId":824,"body":825,"category":814,"created":2581,"description":2582,"extension":796,"faqs":797,"featurePriority":797,"head":797,"landingPath":797,"meta":2583,"navigation":799,"ogImage":797,"path":2595,"robots":797,"schemaOrg":797,"seo":2596,"sitemap":2597,"stem":2598,"tags":2599,"__hash__":2602},"blog/blog/1012.dom-downsampling-for-llm-based-web-agents.md","DOM Downsampling for LLM-Based Web Agents","thassilo-schiepanski",{"type":8,"value":826,"toc":2566},[827,832,856,860,867,871,887,891,897,901,919,944,947,951,954,965,971,1002,1006,1026,1038,1043,1059,1073,1076,1080,1100,1104,1112,1124,1128,1131,1523,1529,1536,1700,1707,1798,1805,1877,1886,1892,1901,1905,1911,1921,1933,2166,2183,2260,2266,2381,2385,2397,2406,2411,2416,2419,2423,2429,2434,2472,2476,2482,2486,2496,2500,2503,2562],[11,828],{":width":829,"alt":830,"format":15,"loading":16,"src":831},"900","Downsampling visualised for digital images and HTML","/blog/dom-downsampling-for-web-agents/1.png",[19,833,834,840,841,840,846,851,852,855],{},[745,835,839],{"href":836,"rel":837},"https://operator.chatgpt.com",[838],"nofollow","Operator (OpenAI)",", ",[745,842,845],{"href":843,"rel":844},"https://www.director.ai",[838],"Director (Browserbase)",[745,847,850],{"href":848,"rel":849},"https://browser-use.com",[838],"Browser Use"," – we are currently witnessing the rise of ",[86,853,854],{},"web AI agents",". The first iteration of serviceable web agents was enabled by frontier LLMs, which act as instantaneous domain model backends. The domain, hereby, corresponds to the landscape of web application UIs.",[51,857,859],{"id":858},"what-is-a-snapshot","What is a Snapshot?",[19,861,862,863,866],{},"Web agents provide an LLM with a task, and serialised runtime state of a currently browsed web application (e.g., a screenshot). The LLM is ought to suggest relevant actions to perform in the web application. Serialisation of such runtime state is referred to as a ",[86,864,865],{},"snapshot",". And the snapshot technique primarily decides the quality of LLM interaction suggestions.",[74,868,870],{"id":869},"gui-snapshots","GUI Snapshots",[19,872,873,874,877,878,882,883,886],{},"Screenshots – for consistency reasons referred to as ",[86,875,876],{},"GUI snapshots"," – resemble how humans visually perceive web application UIs. LLM APIs subsidise the use of image input through upstream compression. Compresssion, however, irreversibly affects image dimensions, which takes away pixel precision; no way to suggest interactions like ",[879,880,881],"em",{},"“click at 100, 735”",". As a workaround, early web agents used ",[879,884,885],{},"grounded"," GUI snapshots. Grounding describes adding visual cues to the GUI, such as bounding boxes with numerical identifiers. Grounding lets the LLM refer to specific parts of the page by identifier, so the agent can trace back interaction targets.",[11,888],{":width":829,"alt":889,"format":15,"loading":16,"src":890},"Grounded GUI snapshot as implemented by Browser Use","/blog/dom-downsampling-for-web-agents/2.png",[19,892,893],{},[894,895,896],"small",{},"Grounded GUI snapshot as implemented by Browser Use.",[74,898,900],{"id":899},"dom-snapshots","DOM Snapshots",[19,902,903,904,914,915,918],{},"LLMs arguably are much better at understanding code than images. Research supports they excel at describing and classifying HTML, and also navigating an inherent UI",[905,906,907],"sup",{},[745,908,913],{"href":909,"ariaDescribedBy":910,"dataFootnoteRef":760,"id":912},"#user-content-fn-1",[911],"footnote-label","user-content-fnref-1","1",". The DOM (document object model) – a web browser's runtime state model of a web application – translates back to HTML. For this reason, ",[86,916,917],{},"DOM snapshots"," offer a compelling alternative to GUI snapshots. DOM snapshots offer a handful of key advantages:",[194,920,921,924,927,930,933],{},[32,922,923],{},"DOM snapshots connect with LLM code (HTML) interpretation abilities.",[32,925,926],{},"DOM snapshots can be compiled from deep clones, hidden from supervision (unlike GUI grounding).",[32,928,929],{},"DOM snapshots render text input that on average consume less bandwidth than screnshots.",[32,931,932],{},"DOM snapshots allow for exact programmatic targeting of elements (e.g., via CSS selectors).",[32,934,935,936,939,940,943],{},"DOM snapshots are available with the ",[62,937,938],{},"DOMContentLoaded"," event (whereas the GUI completes initial rendering with ",[62,941,942],{},"load",").",[19,945,946],{},"Yet, DOM snapshots have a major problem: potentially exhaustive model context. Whereas GUI snapshot commonly cost four figures of tokens, a raw DOM snapshot can cost into hundreds of thousands of tokens. To connect with LLM code interpretation abilities, however, developers have used element extraction techniques – picking only (likely) important elements from the DOM. Element extraction flattens the DOM tree, which disregards hierarchy as a potential UI feature (how do elements relate to each other?).",[51,948,950],{"id":949},"dom-downsampling-a-novel-approach","DOM Downsampling: A Novel Approach",[19,952,953],{},"To enable DOM snapshots for use with web agents, it requires client-side pre-processing – similar to how LLM vision APIs process image input. Downsampling is a fundamental signal processing technique that reduces data that scales out of time or space constraints under the assumption that the majority of relevant features is retained. Picture JPEG compression as an example: put simply, a JPEG image stores only an average colour for patches of pixels. The bigger the patches, the smaller the file. Although some detail is lost, key image features – colours, edges, objects – keep being recognisable – up to a large patch size.",[19,955,956,957,960,961,964],{},"We transfer the concept of ",[86,958,959],{},"downsampling"," to ",[86,962,963],{},"DOMs",". Particularly, since such an approach retains HTML characteristics that might be valuable for an LLM backend. We define UI features as concepts that, to a substantial degree, facilitate LLM suggestions on how to act in the UI in order to solve related web-based tasks.",[51,966,968],{"id":967},"d2snap",[879,969,970],{},"D2Snap",[19,972,973,974,982,990,998,999,1001],{},"We recently proposed ",[745,975,978],{"href":976,"rel":977},"https://arxiv.org/abs/2508.04412",[838],[86,979,980],{},[879,981,970],{},[905,983,984],{},[745,985,989],{"href":986,"ariaDescribedBy":987,"dataFootnoteRef":760,"id":988},"#user-content-fn-2",[911],"user-content-fnref-2","2",[905,991,992],{},[745,993,997],{"href":994,"ariaDescribedBy":995,"dataFootnoteRef":760,"id":996},"#user-content-fn-3",[911],"user-content-fnref-3","3"," – a first-of-its-kind downsampling algorithm for DOMs. Herein, we'll briefly explain how the ",[879,1000,970],{}," algorithm works, and how it can be utilised to build efficient and performant web agents.",[74,1003,1005],{"id":1004},"how-it-works","How it works",[19,1007,1008,1009,1011,1012,840,1015,1018,1019,1022,1023,943],{},"There are basically three redundant types of DOM nodes, and HTML concepts: elements, text, and attributes. We defined and empirically adjusted three node-specific procedures. ",[879,1010,970],{}," downsamples at a variable ratio, configured through procedure-specific parameters  ",[62,1013,1014],{},"k",[62,1016,1017],{},"l",", and ",[62,1020,1021],{},"m"," (",[62,1024,1025],{},"∈ [0, 1]",[1027,1028,1029],"blockquote",{},[19,1030,1031,1032,1037],{},"We used ",[745,1033,1036],{"href":1034,"rel":1035},"https://openai.com/index/hello-gpt-4o/",[838],"GPT-4o"," to create a downsampling ground truth dataset by having it classify HTML elements and scoring semantics regarding relevance for understanding the inherent UI – a UI feature degree.",[1039,1040,1042],"h4",{"id":1041},"procedure-elements","Procedure: Elements",[19,1044,1045,1047,1048,1051,1052,1055,1056,1058],{},[879,1046,970],{}," downsamples (simplifies) elements by merging container elements like ",[62,1049,1050],{},"section"," and ",[62,1053,1054],{},"div"," together. A parameter ",[62,1057,1014],{}," controls the merge ratio depending on the total DOM tree height. For competing concepts, such as element name, the ground truth determines which element's characterisitics to keep – comparing UI feature scores.",[19,1060,1061,1062,840,1064,1066,1067,1072],{},"Elements in content elements (",[62,1063,19],{},[62,1065,1027],{},", ...) are translated to a more comprehensive ",[745,1068,1071],{"href":1069,"rel":1070},"https://www.markdownguide.org/basic-syntax/",[838],"Markdown"," representation.",[19,1074,1075],{},"Interactive elements, definite interaction target candidates, are kept as is.",[1039,1077,1079],{"id":1078},"procedure-text","Procedure: Text",[19,1081,1082,1084,1085,1088,1096,1097,1099],{},[879,1083,970],{}," downsamples text by dropping a fraction. Natural units of text are space-separated words, or punctuation-separated sentences. We reuse the ",[879,1086,1087],{},"TextRank",[905,1089,1090],{},[745,1091,1095],{"href":1092,"ariaDescribedBy":1093,"dataFootnoteRef":760,"id":1094},"#user-content-fn-4",[911],"user-content-fnref-4","4"," algorithm to rank sentences in text nodes. The lowest-ranking fraction of sentences, denoted by parameter ",[62,1098,1017],{},", is dropped.",[1039,1101,1103],{"id":1102},"procedure-attributes","Procedure: Attributes",[19,1105,1106,1108,1109,1111],{},[879,1107,970],{}," downsamples attributes by dropping those with a name that, according to ground truth, holds a UI feature degree below a threshold. Parameter ",[62,1110,1021],{}," denotes this threshold.",[1027,1113,1114],{},[19,1115,1116,1117,1123],{},"Check out the ",[745,1118,1120,1122],{"href":976,"rel":1119},[838],[879,1121,970],{}," paper"," to learn about the algorithm in-depth.",[74,1125,1127],{"id":1126},"example-of-a-downsampled-dom","Example of a Downsampled DOM",[19,1129,1130],{},"Consider a partial DOM state, serialised as HTML:",[1132,1133,1137],"pre",{"className":1134,"code":1135,"language":1136,"meta":760,"style":760},"language-html shiki shiki-themes catppuccin-latte night-owl","\u003Csection class=\"container\" tabindex=\"3\" required=\"true\" type=\"example\">\n  \u003Cdiv class=\"mx-auto\" data-topic=\"products\" required=\"false\">\n    \u003Ch1>Our Pizza\u003C/h1>\n    \u003Cdiv>\n      \u003Cdiv class=\"shadow-lg\">\n        \u003Ch2>Margherita\u003C/h2>\n        \u003Cp>\n          A simple classic: mozzarela, tomatoes and basil.\n          An everyday choice!\n        \u003C/p>\n        \u003Cbutton type=\"button\">Add\u003C/button>\n      \u003C/div>\n      \u003Cdiv class=\"shadow-lg\">\n        \u003Ch2>Capricciosa\u003C/h2>\n        \u003Cp>\n          A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n          A true favourite!\n          \u003C/p>\n        \u003Cbutton type=\"button\">Add\u003C/button>\n      \u003C/div>\n    \u003C/div>\n  \u003C/div>\n\u003C/section>\n","html",[62,1138,1139,1206,1249,1271,1280,1301,1320,1329,1335,1341,1351,1380,1390,1409,1427,1436,1442,1448,1458,1485,1494,1504,1514],{"__ignoreMap":760},[1140,1141,1144,1148,1151,1155,1158,1162,1166,1168,1171,1173,1175,1177,1179,1182,1184,1186,1189,1191,1194,1196,1198,1201,1203],"span",{"class":1142,"line":1143},"line",1,[1140,1145,1147],{"class":1146},"s9rnR","\u003C",[1140,1149,1050],{"class":1150},"sY2RG",[1140,1152,1154],{"class":1153},"swkLt"," class",[1140,1156,1157],{"class":1146},"=",[1140,1159,1161],{"class":1160},"sbuKk","\"",[1140,1163,1165],{"class":1164},"sfrMT","container",[1140,1167,1161],{"class":1160},[1140,1169,1170],{"class":1153}," tabindex",[1140,1172,1157],{"class":1146},[1140,1174,1161],{"class":1160},[1140,1176,997],{"class":1164},[1140,1178,1161],{"class":1160},[1140,1180,1181],{"class":1153}," required",[1140,1183,1157],{"class":1146},[1140,1185,1161],{"class":1160},[1140,1187,1188],{"class":1164},"true",[1140,1190,1161],{"class":1160},[1140,1192,1193],{"class":1153}," type",[1140,1195,1157],{"class":1146},[1140,1197,1161],{"class":1160},[1140,1199,1200],{"class":1164},"example",[1140,1202,1161],{"class":1160},[1140,1204,1205],{"class":1146},">\n",[1140,1207,1208,1211,1213,1215,1217,1219,1222,1224,1227,1229,1231,1234,1236,1238,1240,1242,1245,1247],{"class":1142,"line":761},[1140,1209,1210],{"class":1146},"  \u003C",[1140,1212,1054],{"class":1150},[1140,1214,1154],{"class":1153},[1140,1216,1157],{"class":1146},[1140,1218,1161],{"class":1160},[1140,1220,1221],{"class":1164},"mx-auto",[1140,1223,1161],{"class":1160},[1140,1225,1226],{"class":1153}," data-topic",[1140,1228,1157],{"class":1146},[1140,1230,1161],{"class":1160},[1140,1232,1233],{"class":1164},"products",[1140,1235,1161],{"class":1160},[1140,1237,1181],{"class":1153},[1140,1239,1157],{"class":1146},[1140,1241,1161],{"class":1160},[1140,1243,1244],{"class":1164},"false",[1140,1246,1161],{"class":1160},[1140,1248,1205],{"class":1146},[1140,1250,1251,1254,1257,1260,1264,1267,1269],{"class":1142,"line":766},[1140,1252,1253],{"class":1146},"    \u003C",[1140,1255,1256],{"class":1150},"h1",[1140,1258,1259],{"class":1146},">",[1140,1261,1263],{"class":1262},"s2kId","Our Pizza",[1140,1265,1266],{"class":1146},"\u003C/",[1140,1268,1256],{"class":1150},[1140,1270,1205],{"class":1146},[1140,1272,1274,1276,1278],{"class":1142,"line":1273},4,[1140,1275,1253],{"class":1146},[1140,1277,1054],{"class":1150},[1140,1279,1205],{"class":1146},[1140,1281,1283,1286,1288,1290,1292,1294,1297,1299],{"class":1142,"line":1282},5,[1140,1284,1285],{"class":1146},"      \u003C",[1140,1287,1054],{"class":1150},[1140,1289,1154],{"class":1153},[1140,1291,1157],{"class":1146},[1140,1293,1161],{"class":1160},[1140,1295,1296],{"class":1164},"shadow-lg",[1140,1298,1161],{"class":1160},[1140,1300,1205],{"class":1146},[1140,1302,1304,1307,1309,1311,1314,1316,1318],{"class":1142,"line":1303},6,[1140,1305,1306],{"class":1146},"        \u003C",[1140,1308,51],{"class":1150},[1140,1310,1259],{"class":1146},[1140,1312,1313],{"class":1262},"Margherita",[1140,1315,1266],{"class":1146},[1140,1317,51],{"class":1150},[1140,1319,1205],{"class":1146},[1140,1321,1323,1325,1327],{"class":1142,"line":1322},7,[1140,1324,1306],{"class":1146},[1140,1326,19],{"class":1150},[1140,1328,1205],{"class":1146},[1140,1330,1332],{"class":1142,"line":1331},8,[1140,1333,1334],{"class":1262},"          A simple classic: mozzarela, tomatoes and basil.\n",[1140,1336,1338],{"class":1142,"line":1337},9,[1140,1339,1340],{"class":1262},"          An everyday choice!\n",[1140,1342,1344,1347,1349],{"class":1142,"line":1343},10,[1140,1345,1346],{"class":1146},"        \u003C/",[1140,1348,19],{"class":1150},[1140,1350,1205],{"class":1146},[1140,1352,1354,1356,1359,1361,1363,1365,1367,1369,1371,1374,1376,1378],{"class":1142,"line":1353},11,[1140,1355,1306],{"class":1146},[1140,1357,1358],{"class":1150},"button",[1140,1360,1193],{"class":1153},[1140,1362,1157],{"class":1146},[1140,1364,1161],{"class":1160},[1140,1366,1358],{"class":1164},[1140,1368,1161],{"class":1160},[1140,1370,1259],{"class":1146},[1140,1372,1373],{"class":1262},"Add",[1140,1375,1266],{"class":1146},[1140,1377,1358],{"class":1150},[1140,1379,1205],{"class":1146},[1140,1381,1383,1386,1388],{"class":1142,"line":1382},12,[1140,1384,1385],{"class":1146},"      \u003C/",[1140,1387,1054],{"class":1150},[1140,1389,1205],{"class":1146},[1140,1391,1393,1395,1397,1399,1401,1403,1405,1407],{"class":1142,"line":1392},13,[1140,1394,1285],{"class":1146},[1140,1396,1054],{"class":1150},[1140,1398,1154],{"class":1153},[1140,1400,1157],{"class":1146},[1140,1402,1161],{"class":1160},[1140,1404,1296],{"class":1164},[1140,1406,1161],{"class":1160},[1140,1408,1205],{"class":1146},[1140,1410,1412,1414,1416,1418,1421,1423,1425],{"class":1142,"line":1411},14,[1140,1413,1306],{"class":1146},[1140,1415,51],{"class":1150},[1140,1417,1259],{"class":1146},[1140,1419,1420],{"class":1262},"Capricciosa",[1140,1422,1266],{"class":1146},[1140,1424,51],{"class":1150},[1140,1426,1205],{"class":1146},[1140,1428,1430,1432,1434],{"class":1142,"line":1429},15,[1140,1431,1306],{"class":1146},[1140,1433,19],{"class":1150},[1140,1435,1205],{"class":1146},[1140,1437,1439],{"class":1142,"line":1438},16,[1140,1440,1441],{"class":1262},"          A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[1140,1443,1445],{"class":1142,"line":1444},17,[1140,1446,1447],{"class":1262},"          A true favourite!\n",[1140,1449,1451,1454,1456],{"class":1142,"line":1450},18,[1140,1452,1453],{"class":1146},"          \u003C/",[1140,1455,19],{"class":1150},[1140,1457,1205],{"class":1146},[1140,1459,1461,1463,1465,1467,1469,1471,1473,1475,1477,1479,1481,1483],{"class":1142,"line":1460},19,[1140,1462,1306],{"class":1146},[1140,1464,1358],{"class":1150},[1140,1466,1193],{"class":1153},[1140,1468,1157],{"class":1146},[1140,1470,1161],{"class":1160},[1140,1472,1358],{"class":1164},[1140,1474,1161],{"class":1160},[1140,1476,1259],{"class":1146},[1140,1478,1373],{"class":1262},[1140,1480,1266],{"class":1146},[1140,1482,1358],{"class":1150},[1140,1484,1205],{"class":1146},[1140,1486,1488,1490,1492],{"class":1142,"line":1487},20,[1140,1489,1385],{"class":1146},[1140,1491,1054],{"class":1150},[1140,1493,1205],{"class":1146},[1140,1495,1497,1500,1502],{"class":1142,"line":1496},21,[1140,1498,1499],{"class":1146},"    \u003C/",[1140,1501,1054],{"class":1150},[1140,1503,1205],{"class":1146},[1140,1505,1507,1510,1512],{"class":1142,"line":1506},22,[1140,1508,1509],{"class":1146},"  \u003C/",[1140,1511,1054],{"class":1150},[1140,1513,1205],{"class":1146},[1140,1515,1517,1519,1521],{"class":1142,"line":1516},23,[1140,1518,1266],{"class":1146},[1140,1520,1050],{"class":1150},[1140,1522,1205],{"class":1146},[19,1524,1525,1526,1528],{},"Here are some ",[879,1527,970],{}," downsampling results, which are based on different parametric configurations. A percentage denotes the reduced size.",[1039,1530,1532,1535],{"id":1531},"k3-l3-m3-55",[62,1533,1534],{},"k=.3, l=.3, m=.3"," (55%)",[1132,1537,1539],{"className":1134,"code":1538,"language":1136,"meta":760,"style":760},"\u003Csection tabindex=\"3\" type=\"example\" class=\"container\" required=\"true\">\n  # Our Pizza\n  \u003Cdiv class=\"shadow-lg\">\n    ## Margherita\n    A simple classic: mozzarela, tomatoes, and basil.\n    \u003Cbutton type=\"button\">Add\u003C/button>\n    ## Capricciosa\n    A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n    \u003Cbutton type=\"button\">Add\u003C/button>\n  \u003C/div>\n\u003C/section>\n",[62,1540,1541,1589,1594,1612,1617,1622,1648,1653,1658,1684,1692],{"__ignoreMap":760},[1140,1542,1543,1545,1547,1549,1551,1553,1555,1557,1559,1561,1563,1565,1567,1569,1571,1573,1575,1577,1579,1581,1583,1585,1587],{"class":1142,"line":1143},[1140,1544,1147],{"class":1146},[1140,1546,1050],{"class":1150},[1140,1548,1170],{"class":1153},[1140,1550,1157],{"class":1146},[1140,1552,1161],{"class":1160},[1140,1554,997],{"class":1164},[1140,1556,1161],{"class":1160},[1140,1558,1193],{"class":1153},[1140,1560,1157],{"class":1146},[1140,1562,1161],{"class":1160},[1140,1564,1200],{"class":1164},[1140,1566,1161],{"class":1160},[1140,1568,1154],{"class":1153},[1140,1570,1157],{"class":1146},[1140,1572,1161],{"class":1160},[1140,1574,1165],{"class":1164},[1140,1576,1161],{"class":1160},[1140,1578,1181],{"class":1153},[1140,1580,1157],{"class":1146},[1140,1582,1161],{"class":1160},[1140,1584,1188],{"class":1164},[1140,1586,1161],{"class":1160},[1140,1588,1205],{"class":1146},[1140,1590,1591],{"class":1142,"line":761},[1140,1592,1593],{"class":1262},"  # Our Pizza\n",[1140,1595,1596,1598,1600,1602,1604,1606,1608,1610],{"class":1142,"line":766},[1140,1597,1210],{"class":1146},[1140,1599,1054],{"class":1150},[1140,1601,1154],{"class":1153},[1140,1603,1157],{"class":1146},[1140,1605,1161],{"class":1160},[1140,1607,1296],{"class":1164},[1140,1609,1161],{"class":1160},[1140,1611,1205],{"class":1146},[1140,1613,1614],{"class":1142,"line":1273},[1140,1615,1616],{"class":1262},"    ## Margherita\n",[1140,1618,1619],{"class":1142,"line":1282},[1140,1620,1621],{"class":1262},"    A simple classic: mozzarela, tomatoes, and basil.\n",[1140,1623,1624,1626,1628,1630,1632,1634,1636,1638,1640,1642,1644,1646],{"class":1142,"line":1303},[1140,1625,1253],{"class":1146},[1140,1627,1358],{"class":1150},[1140,1629,1193],{"class":1153},[1140,1631,1157],{"class":1146},[1140,1633,1161],{"class":1160},[1140,1635,1358],{"class":1164},[1140,1637,1161],{"class":1160},[1140,1639,1259],{"class":1146},[1140,1641,1373],{"class":1262},[1140,1643,1266],{"class":1146},[1140,1645,1358],{"class":1150},[1140,1647,1205],{"class":1146},[1140,1649,1650],{"class":1142,"line":1322},[1140,1651,1652],{"class":1262},"    ## Capricciosa\n",[1140,1654,1655],{"class":1142,"line":1331},[1140,1656,1657],{"class":1262},"    A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[1140,1659,1660,1662,1664,1666,1668,1670,1672,1674,1676,1678,1680,1682],{"class":1142,"line":1337},[1140,1661,1253],{"class":1146},[1140,1663,1358],{"class":1150},[1140,1665,1193],{"class":1153},[1140,1667,1157],{"class":1146},[1140,1669,1161],{"class":1160},[1140,1671,1358],{"class":1164},[1140,1673,1161],{"class":1160},[1140,1675,1259],{"class":1146},[1140,1677,1373],{"class":1262},[1140,1679,1266],{"class":1146},[1140,1681,1358],{"class":1150},[1140,1683,1205],{"class":1146},[1140,1685,1686,1688,1690],{"class":1142,"line":1343},[1140,1687,1509],{"class":1146},[1140,1689,1054],{"class":1150},[1140,1691,1205],{"class":1146},[1140,1693,1694,1696,1698],{"class":1142,"line":1353},[1140,1695,1266],{"class":1146},[1140,1697,1050],{"class":1150},[1140,1699,1205],{"class":1146},[1039,1701,1703,1706],{"id":1702},"k4-l6-m8-27",[62,1704,1705],{},"k=.4, l=.6, m=.8"," (27%)",[1132,1708,1710],{"className":1134,"code":1709,"language":1136,"meta":760,"style":760},"\u003Csection>\n  # Our Pizza\n  \u003Cdiv>\n    ## Margherita\n    A simple classic:\n    \u003Cbutton>Add\u003C/button>\n    ## Capricciosa\n    A rich taste:\n    \u003Cbutton>Add\u003C/button>\n  \u003C/div>\n\u003C/section>\n",[62,1711,1712,1720,1724,1732,1736,1741,1757,1761,1766,1782,1790],{"__ignoreMap":760},[1140,1713,1714,1716,1718],{"class":1142,"line":1143},[1140,1715,1147],{"class":1146},[1140,1717,1050],{"class":1150},[1140,1719,1205],{"class":1146},[1140,1721,1722],{"class":1142,"line":761},[1140,1723,1593],{"class":1262},[1140,1725,1726,1728,1730],{"class":1142,"line":766},[1140,1727,1210],{"class":1146},[1140,1729,1054],{"class":1150},[1140,1731,1205],{"class":1146},[1140,1733,1734],{"class":1142,"line":1273},[1140,1735,1616],{"class":1262},[1140,1737,1738],{"class":1142,"line":1282},[1140,1739,1740],{"class":1262},"    A simple classic:\n",[1140,1742,1743,1745,1747,1749,1751,1753,1755],{"class":1142,"line":1303},[1140,1744,1253],{"class":1146},[1140,1746,1358],{"class":1150},[1140,1748,1259],{"class":1146},[1140,1750,1373],{"class":1262},[1140,1752,1266],{"class":1146},[1140,1754,1358],{"class":1150},[1140,1756,1205],{"class":1146},[1140,1758,1759],{"class":1142,"line":1322},[1140,1760,1652],{"class":1262},[1140,1762,1763],{"class":1142,"line":1331},[1140,1764,1765],{"class":1262},"    A rich taste:\n",[1140,1767,1768,1770,1772,1774,1776,1778,1780],{"class":1142,"line":1337},[1140,1769,1253],{"class":1146},[1140,1771,1358],{"class":1150},[1140,1773,1259],{"class":1146},[1140,1775,1373],{"class":1262},[1140,1777,1266],{"class":1146},[1140,1779,1358],{"class":1150},[1140,1781,1205],{"class":1146},[1140,1783,1784,1786,1788],{"class":1142,"line":1343},[1140,1785,1509],{"class":1146},[1140,1787,1054],{"class":1150},[1140,1789,1205],{"class":1146},[1140,1791,1792,1794,1796],{"class":1142,"line":1353},[1140,1793,1266],{"class":1146},[1140,1795,1050],{"class":1150},[1140,1797,1205],{"class":1146},[1039,1799,1801,1804],{"id":1800},"k-l0-m-35",[62,1802,1803],{},"k→∞, l=0, ∀m"," (35%)",[1132,1806,1808],{"className":1134,"code":1807,"language":1136,"meta":760,"style":760},"# Our Pizza\n## Margherita\nA simple classic: mozzarela, tomatoes, and basil.\nAn everyday choice!\n\u003Cbutton>Add\u003C/button>\n## Capricciosa\nA rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\nA true favourite!\n\u003Cbutton>Add\u003C/button>\n",[62,1809,1810,1815,1820,1825,1830,1846,1851,1856,1861],{"__ignoreMap":760},[1140,1811,1812],{"class":1142,"line":1143},[1140,1813,1814],{"class":1262},"# Our Pizza\n",[1140,1816,1817],{"class":1142,"line":761},[1140,1818,1819],{"class":1262},"## Margherita\n",[1140,1821,1822],{"class":1142,"line":766},[1140,1823,1824],{"class":1262},"A simple classic: mozzarela, tomatoes, and basil.\n",[1140,1826,1827],{"class":1142,"line":1273},[1140,1828,1829],{"class":1262},"An everyday choice!\n",[1140,1831,1832,1834,1836,1838,1840,1842,1844],{"class":1142,"line":1282},[1140,1833,1147],{"class":1146},[1140,1835,1358],{"class":1150},[1140,1837,1259],{"class":1146},[1140,1839,1373],{"class":1262},[1140,1841,1266],{"class":1146},[1140,1843,1358],{"class":1150},[1140,1845,1205],{"class":1146},[1140,1847,1848],{"class":1142,"line":1303},[1140,1849,1850],{"class":1262},"## Capricciosa\n",[1140,1852,1853],{"class":1142,"line":1322},[1140,1854,1855],{"class":1262},"A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.\n",[1140,1857,1858],{"class":1142,"line":1331},[1140,1859,1860],{"class":1262},"A true favourite!\n",[1140,1862,1863,1865,1867,1869,1871,1873,1875],{"class":1142,"line":1337},[1140,1864,1147],{"class":1146},[1140,1866,1358],{"class":1150},[1140,1868,1259],{"class":1146},[1140,1870,1373],{"class":1262},[1140,1872,1266],{"class":1146},[1140,1874,1358],{"class":1150},[1140,1876,1205],{"class":1146},[19,1878,1879,1880,1882,1883,1885],{},"Asymptotic ",[62,1881,1014],{}," (kind of 'infinite' ",[62,1884,1014],{},") completely flattens the DOM, that is, leads to a full content linearisation similar to reader views as present in most browsers. Notably, it preserves all interactive elements like buttons – which are essential for a web agent.",[74,1887,1889],{"id":1888},"adaptived2snap",[879,1890,1891],{},"AdaptiveD2Snap",[19,1893,1894,1895,1897,1898,1900],{},"Fixed parameters might not be ideal for arbitrary DOMs – sourced from a landscape of web applications. We created ",[879,1896,1891],{}," – a wrapper for ",[879,1899,970],{}," that infers suitable parameters from a given DOM in order to hit a certain token budget.",[74,1902,1904],{"id":1903},"implementation-integration","Implementation & Integration",[19,1906,1907,1908,1910],{},"Picture an LLM-based weg agent that is premised on DOM snapshots. Implementing ",[879,1909,970],{}," is simple: Deep clone the DOM, and feed it to the algorithm. Now, take the snapshot; this is, serialise the resulting DOM. Done.",[1027,1912,1913],{},[19,1914,1915,1916,1920],{},"Read our ",[745,1917,1919],{"href":1918},"/blog/a-gentle-introduction-to-ai-agents-for-the-web","gentle introduction to AI agents for the web"," to get started with high-level web agent concepts.",[19,1922,1923,1924,1926,1927,1932],{},"The open source ",[879,1925,970],{}," API, provided as a ",[745,1928,1931],{"href":1929,"rel":1930},"https://github.com/webfuse-com/D2Snap",[838],"package on GitHub"," provides the following signature:",[1132,1934,1938],{"className":1935,"code":1936,"language":1937,"meta":760,"style":760},"language-ts shiki shiki-themes catppuccin-latte night-owl","type DOM = Document | Element | string;\ntype Options = {\n  assignUniqueIDs?: boolean; // false\n  debug?: boolean;           // true\n};\n\nD2Snap.d2Snap(\n  dom: DOM,\n  k: number, l: number, m: number,\n  options?: Options\n): Promise\u003Cstring>\n\nD2Snap.adaptiveD2Snap(\n  dom: DOM,\n  maxTokens: number = 4096,\n  maxIterations: number = 5,\n  options?: Options\n): Promise\u003Cstring>\n\n","ts",[62,1939,1940,1973,1985,2004,2018,2023,2028,2042,2054,2072,2082,2098,2102,2113,2121,2134,2146,2154],{"__ignoreMap":760},[1140,1941,1942,1946,1950,1953,1957,1960,1963,1965,1969],{"class":1142,"line":1143},[1140,1943,1945],{"class":1944},"s76yb","type",[1140,1947,1949],{"class":1948},"sXbZB"," DOM ",[1140,1951,1157],{"class":1952},"s-_ek",[1140,1954,1956],{"class":1955},"s-DR7"," Document",[1140,1958,1959],{"class":1146}," |",[1140,1961,1962],{"class":1955}," Element",[1140,1964,1959],{"class":1146},[1140,1966,1968],{"class":1967},"scrte"," string",[1140,1970,1972],{"class":1971},"scGhl",";\n",[1140,1974,1975,1977,1980,1982],{"class":1142,"line":761},[1140,1976,1945],{"class":1944},[1140,1978,1979],{"class":1948}," Options ",[1140,1981,1157],{"class":1952},[1140,1983,1984],{"class":1971}," {\n",[1140,1986,1987,1991,1994,1997,2000],{"class":1142,"line":766},[1140,1988,1990],{"class":1989},"swl0y","  assignUniqueIDs",[1140,1992,1993],{"class":1146},"?:",[1140,1995,1996],{"class":1967}," boolean",[1140,1998,1999],{"class":1971},";",[1140,2001,2003],{"class":2002},"sDmS1"," // false\n",[1140,2005,2006,2009,2011,2013,2015],{"class":1142,"line":1273},[1140,2007,2008],{"class":1989},"  debug",[1140,2010,1993],{"class":1146},[1140,2012,1996],{"class":1967},[1140,2014,1999],{"class":1971},[1140,2016,2017],{"class":2002},"           // true\n",[1140,2019,2020],{"class":1142,"line":1282},[1140,2021,2022],{"class":1971},"};\n",[1140,2024,2025],{"class":1142,"line":1303},[1140,2026,2027],{"emptyLinePlaceholder":799},"\n",[1140,2029,2030,2032,2035,2039],{"class":1142,"line":1322},[1140,2031,970],{"class":1262},[1140,2033,69],{"class":2034},"s5FwJ",[1140,2036,2038],{"class":2037},"sNstc","d2Snap",[1140,2040,2041],{"class":1262},"(\n",[1140,2043,2044,2047,2051],{"class":1142,"line":1331},[1140,2045,2046],{"class":1262},"  dom: ",[1140,2048,2050],{"class":2049},"sqxXB","DOM",[1140,2052,2053],{"class":1971},",\n",[1140,2055,2056,2059,2062,2065,2067,2070],{"class":1142,"line":1337},[1140,2057,2058],{"class":1262},"  k: number",[1140,2060,2061],{"class":1971},",",[1140,2063,2064],{"class":1262}," l: number",[1140,2066,2061],{"class":1971},[1140,2068,2069],{"class":1262}," m: number",[1140,2071,2053],{"class":1971},[1140,2073,2074,2077,2079],{"class":1142,"line":1343},[1140,2075,2076],{"class":1262},"  options",[1140,2078,1993],{"class":1952},[1140,2080,2081],{"class":1262}," Options\n",[1140,2083,2084,2087,2091,2093,2096],{"class":1142,"line":1353},[1140,2085,2086],{"class":1262},"): ",[1140,2088,2090],{"class":2089},"s8Irk","Promise",[1140,2092,1147],{"class":1952},[1140,2094,2095],{"class":1262},"string",[1140,2097,1205],{"class":1952},[1140,2099,2100],{"class":1142,"line":1382},[1140,2101,2027],{"emptyLinePlaceholder":799},[1140,2103,2104,2106,2108,2111],{"class":1142,"line":1392},[1140,2105,970],{"class":1262},[1140,2107,69],{"class":2034},[1140,2109,2110],{"class":2037},"adaptiveD2Snap",[1140,2112,2041],{"class":1262},[1140,2114,2115,2117,2119],{"class":1142,"line":1411},[1140,2116,2046],{"class":1262},[1140,2118,2050],{"class":2049},[1140,2120,2053],{"class":1971},[1140,2122,2123,2126,2128,2132],{"class":1142,"line":1429},[1140,2124,2125],{"class":1262},"  maxTokens: number ",[1140,2127,1157],{"class":1952},[1140,2129,2131],{"class":2130},"sZ_Zo"," 4096",[1140,2133,2053],{"class":1971},[1140,2135,2136,2139,2141,2144],{"class":1142,"line":1438},[1140,2137,2138],{"class":1262},"  maxIterations: number ",[1140,2140,1157],{"class":1952},[1140,2142,2143],{"class":2130}," 5",[1140,2145,2053],{"class":1971},[1140,2147,2148,2150,2152],{"class":1142,"line":1444},[1140,2149,2076],{"class":1262},[1140,2151,1993],{"class":1952},[1140,2153,2081],{"class":1262},[1140,2155,2156,2158,2160,2162,2164],{"class":1142,"line":1450},[1140,2157,2086],{"class":1262},[1140,2159,2090],{"class":2089},[1140,2161,1147],{"class":1952},[1140,2163,2095],{"class":1262},[1140,2165,1205],{"class":1952},[19,2167,2168,2169,2171,2172,2177,2178,2182],{},"Moreover, ",[879,2170,970],{}," it is available on the ",[745,2173,2176],{"href":2174,"rel":2175},"https://dev.webfuse.com/automation-api",[838],"Webfuse Automation API",". ",[745,2179,653],{"href":2180,"rel":2181},"https://www.webfuse.com",[838]," essentially is a proxy to seamlessly serve any existing web application with custom augmentations, such as a web agent widget.",[1132,2184,2188],{"className":2185,"code":2186,"language":2187,"meta":760,"style":760},"language-js shiki shiki-themes catppuccin-latte night-owl","const domSnapshot = await browser.webfuseSession\n    .automation\n    .take_dom_snapshot({ modifier: 'downsample' })\n","js",[62,2189,2190,2216,2225],{"__ignoreMap":760},[1140,2191,2192,2195,2199,2202,2206,2210,2212],{"class":1142,"line":1143},[1140,2193,2194],{"class":1944},"const",[1140,2196,2198],{"class":2197},"scsc5"," domSnapshot",[1140,2200,2201],{"class":1952}," =",[1140,2203,2205],{"class":2204},"srhcd"," await",[1140,2207,2209],{"class":2208},"sP4PM"," browser",[1140,2211,69],{"class":2034},[1140,2213,2215],{"class":2214},"s8apv","webfuseSession\n",[1140,2217,2218,2221],{"class":1142,"line":761},[1140,2219,2220],{"class":2034},"    .",[1140,2222,2224],{"class":2223},"sL4Ga","automation\n",[1140,2226,2227,2229,2232,2235,2238,2241,2245,2248,2251,2254,2257],{"class":1142,"line":766},[1140,2228,2220],{"class":2034},[1140,2230,2231],{"class":2037},"take_dom_snapshot",[1140,2233,2234],{"class":1262},"(",[1140,2236,2237],{"class":1971},"{",[1140,2239,2240],{"class":1262}," modifier",[1140,2242,2244],{"class":2243},"sVS64",":",[1140,2246,2247],{"class":1160}," '",[1140,2249,2250],{"class":1164},"downsample",[1140,2252,2253],{"class":1160},"'",[1140,2255,2256],{"class":1971}," }",[1140,2258,2259],{"class":1262},")\n",[19,2261,2262,2263,2265],{},"Need precise control over the underlying ",[879,2264,970],{}," invocation? Configure it exactly how you want:",[1132,2267,2269],{"className":2185,"code":2268,"language":2187,"meta":760,"style":760},"const domSnapshot = await browser.webfuseSession\n    .automation\n    .take_dom_snapshot({\n        modifier: {\n            name: 'D2Snap',\n            params: { hierarchyRatio: 0.6, textRatio: 0.2, attributeRatio: 0.8 }\n        }\n    })\n",[62,2270,2271,2287,2293,2304,2313,2328,2369,2374],{"__ignoreMap":760},[1140,2272,2273,2275,2277,2279,2281,2283,2285],{"class":1142,"line":1143},[1140,2274,2194],{"class":1944},[1140,2276,2198],{"class":2197},[1140,2278,2201],{"class":1952},[1140,2280,2205],{"class":2204},[1140,2282,2209],{"class":2208},[1140,2284,69],{"class":2034},[1140,2286,2215],{"class":2214},[1140,2288,2289,2291],{"class":1142,"line":761},[1140,2290,2220],{"class":2034},[1140,2292,2224],{"class":2223},[1140,2294,2295,2297,2299,2301],{"class":1142,"line":766},[1140,2296,2220],{"class":2034},[1140,2298,2231],{"class":2037},[1140,2300,2234],{"class":1262},[1140,2302,2303],{"class":1971},"{\n",[1140,2305,2306,2309,2311],{"class":1142,"line":1273},[1140,2307,2308],{"class":1262},"        modifier",[1140,2310,2244],{"class":2243},[1140,2312,1984],{"class":1971},[1140,2314,2315,2318,2320,2322,2324,2326],{"class":1142,"line":1282},[1140,2316,2317],{"class":1262},"            name",[1140,2319,2244],{"class":2243},[1140,2321,2247],{"class":1160},[1140,2323,970],{"class":1164},[1140,2325,2253],{"class":1160},[1140,2327,2053],{"class":1971},[1140,2329,2330,2333,2335,2338,2341,2343,2346,2348,2351,2353,2356,2358,2361,2363,2366],{"class":1142,"line":1303},[1140,2331,2332],{"class":1262},"            params",[1140,2334,2244],{"class":2243},[1140,2336,2337],{"class":1971}," {",[1140,2339,2340],{"class":1262}," hierarchyRatio",[1140,2342,2244],{"class":2243},[1140,2344,2345],{"class":2130}," 0.6",[1140,2347,2061],{"class":1971},[1140,2349,2350],{"class":1262}," textRatio",[1140,2352,2244],{"class":2243},[1140,2354,2355],{"class":2130}," 0.2",[1140,2357,2061],{"class":1971},[1140,2359,2360],{"class":1262}," attributeRatio",[1140,2362,2244],{"class":2243},[1140,2364,2365],{"class":2130}," 0.8",[1140,2367,2368],{"class":1971}," }\n",[1140,2370,2371],{"class":1142,"line":1322},[1140,2372,2373],{"class":1971},"        }\n",[1140,2375,2376,2379],{"class":1142,"line":1331},[1140,2377,2378],{"class":1971},"    }",[1140,2380,2259],{"class":1262},[74,2382,2384],{"id":2383},"performance-evaluation","Performance Evaluation",[19,2386,2387,2388,2390,2391,2393,2394,2396],{},"Now for the moment of truth: How does ",[879,2389,970],{}," stack up against the industry standard? We evaluated ",[879,2392,970],{}," in comparison to a grounded GUI snapshot baseline close to those used by ",[879,2395,850],{}," – coloured bounding boxes around visible interactive elements.",[19,2398,2399,2400,2405],{},"To evaluate snapshots isolated from specific agent logic, we crafted a dataset that spans all UI states that occur while solving a related task. We sampled our dataset from the existing ",[745,2401,2404],{"href":2402,"rel":2403},"https://github.com/OSU-NLP-Group/Online-Mind2Web",[838],"Online-Mind2Web"," dataset.",[11,2407],{":width":2408,"alt":2409,"format":15,"loading":16,"src":2410},"800","Exemplary solution UI state trajectory of a defined web-based task","/blog/dom-downsampling-for-web-agents/3.png",[19,2412,2413],{},[894,2414,2415],{},"Exemplary solution UI state trajectory for the task: “View the pricing plan for 'Business'. Specifically, we have 100 users. We need a 1PB storage quota and a 50 TB transfer quota.”",[19,2417,2418],{},"These are our key findings...",[1039,2420,2422],{"id":2421},"substantial-success-rates","Substantial Success Rates",[19,2424,2425,2426,2428],{},"The results exceeded our expectations. Not only did ",[879,2427,970],{}," meet the baseline's performance – our best configuration outperformed it by a significant margin. Full linearisation matches performance, and estimated model input token size order of the baseline.",[11,2430],{":width":2431,"alt":2432,"format":15,"loading":16,"src":2433},"550","Success rate per web agent snapshot subject evaluated across the dataset","/blog/dom-downsampling-for-web-agents/4.png",[894,2435,2436,2437,2444,2445,2447,2448,2451,2452,2455,2456,2459,2460,2463,2464,2467,2468,2471],{},"\n  Success rate per web agent snapshot subject evaluated across the dataset.\n  Labels: ",[62,2438,2439,2440],{},"GUI",[2441,2442,2443],"sub",{}," gr.",": Baseline, ",[62,2446,2050],{},": Raw DOM (cut-off at ~8K tokens), ",[62,2449,2450],{},"k( l m)",": Parameter values; e.g., ",[62,2453,2454],{},".9 .3 .6",", or ",[62,2457,2458],{},".4"," if equal). ",[62,2461,2462],{},"∞",": Linearisation,  ",[62,2465,2466],{},"8192 / 32768",": via token-limited (resp.) ",[2469,2470,1891],"i",{},".\n",[1039,2473,2475],{"id":2474},"containable-token-and-byte-size","Containable Token and Byte Size",[19,2477,2478,2479,2481],{},"Even light downsampling delivers dramatic size reductions. Most ",[879,2480,970],{}," configurations average just one token order above the baseline – a massive improvement over raw DOM snapshots. Better yet, most DOMs from the dataset could actually be downsampled to the baseline order. And while image data balloons in file size, our text-based approach stays lean and efficient.",[11,2483],{":width":2408,"alt":2484,"format":15,"loading":16,"src":2485},"Comparison of mean input size across and per subject","/blog/dom-downsampling-for-web-agents/5.png",[894,2487,2488,2489,2492,2493,2495],{},"\n  Left: Comparison of mean input size (tokens vs bytes) across and per subject.",[2490,2491],"br",{},"\n  Right: Estimated input token size across the dataset created by a single ",[2469,2494,970],{}," evaluation subject.\n",[1039,2497,2499],{"id":2498},"hierarchy-actually-matters","Hierarchy Actually Matters",[19,2501,2502],{},"Which UI feature matters most for LLM web agent backend performance? We alternated parameter configurations to find out. Interestingly, hierarchy reveals itself as the strongest of the three assessed features. Element extraction throws away hierarchy, which suggests that downsampling is a superior technique.",[1050,2504,2507,2512],{"className":2505,"dataFootnotes":760},[2506],"footnotes",[51,2508,2511],{"className":2509,"id":911},[2510],"sr-only","Footnotes",[194,2513,2514,2528,2539,2550],{},[32,2515,2517,743,2521],{"id":2516},"user-content-fn-1",[745,2518,2519],{"href":2519,"rel":2520},"https://arxiv.org/abs/2210.03945",[838],[745,2522,2527],{"href":2523,"ariaLabel":2524,"className":2525,"dataFootnoteBackref":760},"#user-content-fnref-1","Back to reference 1",[2526],"data-footnote-backref","↩",[32,2529,2531,743,2534],{"id":2530},"user-content-fn-2",[745,2532,976],{"href":976,"rel":2533},[838],[745,2535,2527],{"href":2536,"ariaLabel":2537,"className":2538,"dataFootnoteBackref":760},"#user-content-fnref-2","Back to reference 2",[2526],[32,2540,2542,743,2545],{"id":2541},"user-content-fn-3",[745,2543,1929],{"href":1929,"rel":2544},[838],[745,2546,2527],{"href":2547,"ariaLabel":2548,"className":2549,"dataFootnoteBackref":760},"#user-content-fnref-3","Back to reference 3",[2526],[32,2551,2553,743,2557],{"id":2552},"user-content-fn-4",[745,2554,2555],{"href":2555,"rel":2556},"https://aclanthology.org/W04-3252",[838],[745,2558,2527],{"href":2559,"ariaLabel":2560,"className":2561,"dataFootnoteBackref":760},"#user-content-fnref-4","Back to reference 4",[2526],[2563,2564,2565],"style",{},"html pre.shiki code .s9rnR, html code.shiki .s9rnR{--shiki-default:#179299;--shiki-dark:#7FDBCA}html pre.shiki code .sY2RG, html code.shiki .sY2RG{--shiki-default:#1E66F5;--shiki-dark:#CAECE6}html pre.shiki code .swkLt, html code.shiki .swkLt{--shiki-default:#DF8E1D;--shiki-default-font-style:inherit;--shiki-dark:#C5E478;--shiki-dark-font-style:italic}html pre.shiki code .sbuKk, html code.shiki .sbuKk{--shiki-default:#40A02B;--shiki-dark:#D9F5DD}html pre.shiki code .sfrMT, html code.shiki .sfrMT{--shiki-default:#40A02B;--shiki-dark:#ECC48D}html pre.shiki code .s2kId, html code.shiki .s2kId{--shiki-default:#4C4F69;--shiki-dark:#D6DEEB}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s76yb, html code.shiki .s76yb{--shiki-default:#8839EF;--shiki-dark:#C792EA}html pre.shiki code .sXbZB, html code.shiki .sXbZB{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .s-_ek, html code.shiki .s-_ek{--shiki-default:#179299;--shiki-dark:#C792EA}html pre.shiki code .s-DR7, html code.shiki .s-DR7{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#FFCB8B;--shiki-dark-font-style:inherit}html pre.shiki code .scrte, html code.shiki .scrte{--shiki-default:#8839EF;--shiki-dark:#C5E478}html pre.shiki code .scGhl, html code.shiki .scGhl{--shiki-default:#7C7F93;--shiki-dark:#D6DEEB}html pre.shiki code .swl0y, html code.shiki .swl0y{--shiki-default:#4C4F69;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .sDmS1, html code.shiki .sDmS1{--shiki-default:#7C7F93;--shiki-default-font-style:italic;--shiki-dark:#637777;--shiki-dark-font-style:italic}html pre.shiki code .s5FwJ, html code.shiki .s5FwJ{--shiki-default:#179299;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sNstc, html code.shiki .sNstc{--shiki-default:#1E66F5;--shiki-default-font-style:italic;--shiki-dark:#82AAFF;--shiki-dark-font-style:italic}html pre.shiki code .sqxXB, html code.shiki .sqxXB{--shiki-default:#4C4F69;--shiki-dark:#82AAFF}html pre.shiki code .s8Irk, html code.shiki .s8Irk{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#C5E478;--shiki-dark-font-style:inherit}html pre.shiki code .sZ_Zo, html code.shiki .sZ_Zo{--shiki-default:#FE640B;--shiki-dark:#F78C6C}html pre.shiki code .scsc5, html code.shiki .scsc5{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#82AAFF;--shiki-dark-font-style:italic}html pre.shiki code .srhcd, html code.shiki .srhcd{--shiki-default:#8839EF;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sP4PM, html code.shiki .sP4PM{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#7FDBCA;--shiki-dark-font-style:italic}html pre.shiki code .s8apv, html code.shiki .s8apv{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#BAEBE2;--shiki-dark-font-style:italic}html pre.shiki code .sL4Ga, html code.shiki .sL4Ga{--shiki-default:#4C4F69;--shiki-dark:#BAEBE2}html pre.shiki code .sVS64, html code.shiki .sVS64{--shiki-default:#179299;--shiki-dark:#D6DEEB}",{"title":760,"searchDepth":761,"depth":761,"links":2567},[2568,2572,2573,2580],{"id":858,"depth":761,"text":859,"children":2569},[2570,2571],{"id":869,"depth":766,"text":870},{"id":899,"depth":766,"text":900},{"id":949,"depth":761,"text":950},{"id":967,"depth":761,"text":970,"children":2574},[2575,2576,2577,2578,2579],{"id":1004,"depth":766,"text":1005},{"id":1126,"depth":766,"text":1127},{"id":1888,"depth":766,"text":1891},{"id":1903,"depth":766,"text":1904},{"id":2383,"depth":766,"text":2384},{"id":911,"depth":761,"text":2511},"2025-08-18","We propose D2Snap – a first-of-its-kind downsampling algorithm for DOMs. D2Snap can be used as a pre-processing technique for DOM snapshots to optimise web agency context quality and token costs.",{"homepage":799,"relatedLinks":2584},[2585,2589,2592],{"text":2586,"href":2587,"description":2588},"What is a Website Snapshot?","/blog/snapshots-provide-llms-with-website-state","Learn what a website snapshot is and how to utilise it for web agents",{"text":2590,"href":1918,"description":2591},"What is a Web Agent?","Learn the basics of web agents",{"text":2176,"href":2593,"external":799,"description":2594},"https://dev.webfuse.com/automation-api#take_dom_snapshot","Check out the Webfuse Automation API","/blog/dom-downsampling-for-llm-based-web-agents",{"title":823,"description":2582},{"loc":2595},"blog/1012.dom-downsampling-for-llm-based-web-agents",[814,815,2600,2601,817,818],"llms","llm-context","lDh50lEtos4T_tIdGCLKDox16i6ixbPnRxPJoFpKjnE",{"id":2604,"title":2605,"authorId":824,"body":2606,"category":814,"created":3334,"description":3335,"extension":796,"faqs":797,"featurePriority":761,"head":797,"landingPath":797,"meta":3336,"navigation":799,"ogImage":797,"path":1918,"robots":797,"schemaOrg":797,"seo":3345,"sitemap":3346,"stem":3347,"tags":3348,"__hash__":3349},"blog/blog/1011.a-gentle-introduction-to-ai-agents-for-the-web.md","A Gentle Introduction to AI Agents for the Web",{"type":8,"value":2607,"toc":3315},[2608,2622,2625,2632,2638,2642,2645,2660,2664,2674,2678,2682,2695,2699,2703,2706,2711,2715,2724,2728,2739,2744,2748,2766,2770,2776,2878,2881,3114,3130,3134,3137,3142,3146,3149,3153,3171,3196,3203,3207,3245,3248,3259,3263,3266,3294,3298,3306,3312],[19,2609,2610,2611,840,2615,1018,2618,2621],{},"In no time, AI became a natural part of modern web interfaces. AI agents for the web enjoy a recent hype, sparked by the means of ",[745,2612,839],{"href":2613,"rel":2614},"https://openai.com/index/introducing-operator/",[838],[745,2616,845],{"href":843,"rel":2617},[838],[745,2619,850],{"href":848,"rel":2620},[838],". By now, it is within reach to automate arbitrary web-based tasks, such as booking the cheapest flight from Berlin to Amsterdam.",[51,2623,2590],{"id":2624},"what-is-a-web-agent",[19,2626,2627,2628,2631],{},"For starters, let us break down the term ",[86,2629,2630],{},"web AI agent",": An agent is an entity that autonomously acts on behalf of another entity. An artificially intelligent agent is an application that acts on behalf of a human. In contrast to non-AI computer agents, it solves complex tasks with at least human-grade effectiveness and efficiency. For a human-centric web, web agents have deliberately been designed to browse the web in a human fashion – through UIs rather than APIs.",[11,2633],{":width":2634,"alt":2635,"format":2636,"loading":16,"src":2637},"610","High-level agent description comparing human and computer agents","svg","/blog/a-gentle-introduction-to-ai-agents-for-the-web/1.svg",[74,2639,2641],{"id":2640},"the-role-of-frontier-llms","The Role of Frontier LLMs",[19,2643,2644],{},"Web agents have been a vague desire for a long time. AI agents used to rely on complete models of a problem domain in order to allow (heuristic) search through problem states. Such models would comprise the problem world (e.g., a chessboard), actors (pawns, rooks, etc.), possible actions per actor (rook moves straight), and constraints (i.a., max one piece per field). A heterogeneous space of web application UIs describes the problem domain of a web agent: how to understand a web page, and how to interact with it to solve the declared task?",[19,2646,2647,2648,2655,2656,2659],{},"Frontier LLMs disrupted the AI agent world: explicit problem domain models beyond feasibility can now be replaced by an LLM. The LLM thereby acts as an instantaneous domain model backend that can be consulted with twofold context: serialised problem state, such as a chess position code (",[879,2649,2650,2651,2654],{},"“",[1140,2652,2653],{},"..."," e4 e5 2. Nc3 f5”","), and the respective task (",[879,2657,2658],{},"“What is the best move for white?”","). For web agents, problem state corresponds to the currently browsed web application's runtime state, for instance, a screenshot.",[74,2661,2663],{"id":2662},"generalist-web-agents","Generalist Web Agents",[19,2665,2666,2667,1018,2670,2673],{},"Generalist web agents are supposed to solve arbitrary tasks through a web browser. Web-based tasks can be as diverse as ",[879,2668,2669],{},"“Find a picture of a cat.”",[879,2671,2672],{},"“Book the cheapest flight from Berlin to Amsterdam tomorrow afternoon (business class, window seat).”"," In reality, generalist agents still fail uncommon or too precise tasks. While they have been critically acclaimed, they mainly act as early proofs-of-concept. Tasks that are indeed solvable with a generalist agent promise great results with an according specialist agent.",[11,2675],{":width":829,"alt":2676,"format":15,"loading":16,"src":2677},"Screenshot of a generalist web agent UI (Director)","/blog/a-gentle-introduction-to-ai-agents-for-the-web/2.png",[74,2679,2681],{"id":2680},"specialist-web-agents","Specialist Web Agents",[19,2683,2684,2685,2688,2689,2694],{},"Other than generalist agents, specialist web agents are constrained to a certain task and application domain. Specialist agents bear the major share of commercial value. Most prominently, modal chat agents that provide users with on-page help. Picture a little floating widget that can be chatted to via text or voice input. In most cases, in fact, the term ",[879,2686,2687],{},"web (AI) agent"," refers to chat agents. Chat agents – text or voice – can be implemented on top of virtually any existing website. Frontier LLMs provide a lot of commonsense out-of-the-box. A ",[745,2690,2693],{"href":2691,"rel":2692},"https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts",[838],"system prompt"," can, moreover, be leveraged to drive specialist agent quality for the respective problem domain.",[11,2696],{":width":829,"alt":2697,"format":15,"loading":16,"src":2698},"Screenshots of two modal specialist web agent UIs augmenting an underlying website's UI","/blog/a-gentle-introduction-to-ai-agents-for-the-web/3.png",[51,2700,2702],{"id":2701},"how-does-a-web-agent-work","How Does a Web Agent Work?",[19,2704,2705],{},"LLM-based web agents are premised on a more or less uniform architecture. The agent application embodies a mediator between a web browser (environment), and the LLM backend (model).",[11,2707],{":width":2708,"alt":2709,"format":2636,"loading":16,"src":2710},"480","High-level web agent architecture component view","/blog/a-gentle-introduction-to-ai-agents-for-the-web/4.svg",[74,2712,2714],{"id":2713},"the-agent-lifecycle","The Agent Lifecycle",[19,2716,2717,2718,2723],{},"To reduce a user's cognitive load, solving a web-based task is usually chunked into a sequence of UI states. Consider looking for rental apartments on ",[745,2719,2722],{"href":2720,"rel":2721},"https://www.redfin.com",[838],"redfin.com",": In the first step, you specify a location. Only subsequently are you provided with a grid of available apartments for that location.",[11,2725],{":width":829,"alt":2726,"format":15,"loading":16,"src":2727},"Example of separated UI states in a rental home search application","/blog/a-gentle-introduction-to-ai-agents-for-the-web/5.png",[19,2729,2730,2731,2738],{},"Web agent logic is iterative; not least for a sequential web interaction model, but also for a conversational agent interaction model. Browsing the web, human and computer agents represent users alike. That said, Norman's well-known ",[745,2732,2735],{"href":2733,"rel":2734},"https://mitpress.mit.edu/9780262640374/the-design-of-everyday-things/",[838],[879,2736,2737],{},"Seven Stages of Action",", which hierarchically model the human cognition cycle, transfer to the web agent lifecycle. For each UI state in a web browser (environment) and web-based task (action intention); decide where to click, type, etc. (action planning), and perform those clicks, etc. (action execution). Afterwards, perceive, interpret, and evaluate the results of those actions in the web browser (state). As long as there is a mismatch between the evaluated state and the declared goal state, repeat that cycle. Potentially prompt the user with more required information.",[11,2740],{":width":2741,"alt":2742,"format":2636,"loading":16,"src":2743},"580","Donald 'Norman's Seven Stages of Action' model of the human cognition cycle that transfers to non-human agents","/blog/a-gentle-introduction-to-ai-agents-for-the-web/6.svg",[74,2745,2747],{"id":2746},"web-context-for-llms","Web Context for LLMs",[19,2749,2750,2751,2753,2754,2757,2758,2761,2762,2765],{},"The gap from an agent towards the environment, according to ",[879,2752,2737],{},", is known as the ",[879,2755,2756],{},"gulf of execution",". In real-world scenarios, how to act in the environment in respect to a planned sequence of actions might be difficult (e.g., how to actually open the trunk of a new car?). Arguably, web agents face a novel ",[879,2759,2760],{},"gulf of intention"," towards the action planning stage: how to serialise a currently browsed web page's runtime state for LLMs? ",[879,2763,2764],{},"Snapshot"," is a more comprehensive term to describe the serialisation of a web page's current runtime state. Screenshots, for instance, represent a type of snapshot that closely resembles how humans perceive a web page at a given point in time. But are they as accessible to LLMs?",[74,2767,2769],{"id":2768},"agentic-ui-interaction","Agentic UI Interaction",[19,2771,2772,2773,2775],{},"With a qualified set of well-defined actuation methods, web agents are able to close the ",[879,2774,2756],{}," quite well. HTML element types strongly afford a certain action (e.g., click a button, type to a field). Below is how an actuation schema to present the LLM backend with could look like:",[1132,2777,2779],{"className":1935,"code":2778,"language":1937,"meta":760,"style":760},"interface ActuationSchema = {\n    thought: string;\n    action: \"click\"\n        | \"scroll\"\n        | \"type\";\n    cssSelector: string;\n    data?: string;\n}[];\n",[62,2780,2781,2794,2805,2822,2834,2846,2857,2868],{"__ignoreMap":760},[1140,2782,2783,2786,2789,2792],{"class":1142,"line":1143},[1140,2784,2785],{"class":1944},"interface",[1140,2787,2788],{"class":1948}," ActuationSchema",[1140,2790,2791],{"class":1262}," = ",[1140,2793,2303],{"class":1971},[1140,2795,2796,2799,2801,2803],{"class":1142,"line":761},[1140,2797,2798],{"class":1262},"    thought",[1140,2800,2244],{"class":1146},[1140,2802,1968],{"class":1967},[1140,2804,1972],{"class":1971},[1140,2806,2807,2810,2812,2815,2819],{"class":1142,"line":766},[1140,2808,2809],{"class":1262},"    action",[1140,2811,2244],{"class":1146},[1140,2813,2814],{"class":1160}," \"",[1140,2816,2818],{"class":2817},"sgAC-","click",[1140,2820,2821],{"class":1160},"\"\n",[1140,2823,2824,2827,2829,2832],{"class":1142,"line":1273},[1140,2825,2826],{"class":1146},"        |",[1140,2828,2814],{"class":1160},[1140,2830,2831],{"class":2817},"scroll",[1140,2833,2821],{"class":1160},[1140,2835,2836,2838,2840,2842,2844],{"class":1142,"line":1282},[1140,2837,2826],{"class":1146},[1140,2839,2814],{"class":1160},[1140,2841,1945],{"class":2817},[1140,2843,1161],{"class":1160},[1140,2845,1972],{"class":1971},[1140,2847,2848,2851,2853,2855],{"class":1142,"line":1303},[1140,2849,2850],{"class":1262},"    cssSelector",[1140,2852,2244],{"class":1146},[1140,2854,1968],{"class":1967},[1140,2856,1972],{"class":1971},[1140,2858,2859,2862,2864,2866],{"class":1142,"line":1322},[1140,2860,2861],{"class":1262},"    data",[1140,2863,1993],{"class":1146},[1140,2865,1968],{"class":1967},[1140,2867,1972],{"class":1971},[1140,2869,2870,2873,2876],{"class":1142,"line":1331},[1140,2871,2872],{"class":1971},"}",[1140,2874,2875],{"class":1262},"[]",[1140,2877,1972],{"class":1971},[19,2879,2880],{},"And a suggested actions response could, in turn, look as follows:",[1132,2882,2886],{"className":2883,"code":2884,"language":2885,"meta":760,"style":760},"language-json shiki shiki-themes catppuccin-latte night-owl","[\n    {\n        \"thought\": \"Scroll newsletter cta into view\",\n        \"action\": \"scroll\",\n        \"cssSelector\": \"section#newsletter\"\n    },\n    {\n        \"thought\": \"Type email address to newsletter cta\",\n        \"action\": \"type\",\n        \"cssSelector\": \"section#newsletter > input\",\n        \"data\": \"user@example.org\"\n    },\n    {\n        \"thought\": \"Submit newsletter sign up\",\n        \"action\": \"click\",\n        \"cssSelector\": \"section#newsletter > button\"\n    }\n]\n","json",[62,2887,2888,2893,2898,2922,2941,2959,2964,2968,2987,3005,3024,3042,3046,3050,3069,3087,3104,3109],{"__ignoreMap":760},[1140,2889,2890],{"class":1142,"line":1143},[1140,2891,2892],{"class":1971},"[\n",[1140,2894,2895],{"class":1142,"line":761},[1140,2896,2897],{"class":1971},"    {\n",[1140,2899,2900,2904,2908,2910,2912,2914,2918,2920],{"class":1142,"line":766},[1140,2901,2903],{"class":2902},"srFR9","        \"",[1140,2905,2907],{"class":2906},"s30W1","thought",[1140,2909,1161],{"class":2902},[1140,2911,2244],{"class":1971},[1140,2913,2814],{"class":1160},[1140,2915,2917],{"class":2916},"sCC8C","Scroll newsletter cta into view",[1140,2919,1161],{"class":1160},[1140,2921,2053],{"class":1971},[1140,2923,2924,2926,2929,2931,2933,2935,2937,2939],{"class":1142,"line":1273},[1140,2925,2903],{"class":2902},[1140,2927,2928],{"class":2906},"action",[1140,2930,1161],{"class":2902},[1140,2932,2244],{"class":1971},[1140,2934,2814],{"class":1160},[1140,2936,2831],{"class":2916},[1140,2938,1161],{"class":1160},[1140,2940,2053],{"class":1971},[1140,2942,2943,2945,2948,2950,2952,2954,2957],{"class":1142,"line":1282},[1140,2944,2903],{"class":2902},[1140,2946,2947],{"class":2906},"cssSelector",[1140,2949,1161],{"class":2902},[1140,2951,2244],{"class":1971},[1140,2953,2814],{"class":1160},[1140,2955,2956],{"class":2916},"section#newsletter",[1140,2958,2821],{"class":1160},[1140,2960,2961],{"class":1142,"line":1303},[1140,2962,2963],{"class":1971},"    },\n",[1140,2965,2966],{"class":1142,"line":1322},[1140,2967,2897],{"class":1971},[1140,2969,2970,2972,2974,2976,2978,2980,2983,2985],{"class":1142,"line":1331},[1140,2971,2903],{"class":2902},[1140,2973,2907],{"class":2906},[1140,2975,1161],{"class":2902},[1140,2977,2244],{"class":1971},[1140,2979,2814],{"class":1160},[1140,2981,2982],{"class":2916},"Type email address to newsletter cta",[1140,2984,1161],{"class":1160},[1140,2986,2053],{"class":1971},[1140,2988,2989,2991,2993,2995,2997,2999,3001,3003],{"class":1142,"line":1337},[1140,2990,2903],{"class":2902},[1140,2992,2928],{"class":2906},[1140,2994,1161],{"class":2902},[1140,2996,2244],{"class":1971},[1140,2998,2814],{"class":1160},[1140,3000,1945],{"class":2916},[1140,3002,1161],{"class":1160},[1140,3004,2053],{"class":1971},[1140,3006,3007,3009,3011,3013,3015,3017,3020,3022],{"class":1142,"line":1343},[1140,3008,2903],{"class":2902},[1140,3010,2947],{"class":2906},[1140,3012,1161],{"class":2902},[1140,3014,2244],{"class":1971},[1140,3016,2814],{"class":1160},[1140,3018,3019],{"class":2916},"section#newsletter > input",[1140,3021,1161],{"class":1160},[1140,3023,2053],{"class":1971},[1140,3025,3026,3028,3031,3033,3035,3037,3040],{"class":1142,"line":1353},[1140,3027,2903],{"class":2902},[1140,3029,3030],{"class":2906},"data",[1140,3032,1161],{"class":2902},[1140,3034,2244],{"class":1971},[1140,3036,2814],{"class":1160},[1140,3038,3039],{"class":2916},"user@example.org",[1140,3041,2821],{"class":1160},[1140,3043,3044],{"class":1142,"line":1382},[1140,3045,2963],{"class":1971},[1140,3047,3048],{"class":1142,"line":1392},[1140,3049,2897],{"class":1971},[1140,3051,3052,3054,3056,3058,3060,3062,3065,3067],{"class":1142,"line":1411},[1140,3053,2903],{"class":2902},[1140,3055,2907],{"class":2906},[1140,3057,1161],{"class":2902},[1140,3059,2244],{"class":1971},[1140,3061,2814],{"class":1160},[1140,3063,3064],{"class":2916},"Submit newsletter sign up",[1140,3066,1161],{"class":1160},[1140,3068,2053],{"class":1971},[1140,3070,3071,3073,3075,3077,3079,3081,3083,3085],{"class":1142,"line":1429},[1140,3072,2903],{"class":2902},[1140,3074,2928],{"class":2906},[1140,3076,1161],{"class":2902},[1140,3078,2244],{"class":1971},[1140,3080,2814],{"class":1160},[1140,3082,2818],{"class":2916},[1140,3084,1161],{"class":1160},[1140,3086,2053],{"class":1971},[1140,3088,3089,3091,3093,3095,3097,3099,3102],{"class":1142,"line":1438},[1140,3090,2903],{"class":2902},[1140,3092,2947],{"class":2906},[1140,3094,1161],{"class":2902},[1140,3096,2244],{"class":1971},[1140,3098,2814],{"class":1160},[1140,3100,3101],{"class":2916},"section#newsletter > button",[1140,3103,2821],{"class":1160},[1140,3105,3106],{"class":1142,"line":1444},[1140,3107,3108],{"class":1971},"    }\n",[1140,3110,3111],{"class":1142,"line":1450},[1140,3112,3113],{"class":1971},"]\n",[1027,3115,3116],{},[19,3117,3118,3123,3124,3129],{},[745,3119,3122],{"href":3120,"rel":3121},"https://platform.openai.com/docs/guides/function-calling",[838],"Function Calling"," and the ",[745,3125,3128],{"href":3126,"rel":3127},"https://modelcontextprotocol.io",[838],"Model Context Protocol"," represent two ends to outsource an explicit actuation model – server- and client-side, respectively.",[74,3131,3133],{"id":3132},"agentic-ui-augmentation","Agentic UI Augmentation",[19,3135,3136],{},"An agent represents yet another feature to integrate with an application and its UI. Discoverability and availability, however, are among the most fundamental requirements of a web agent. Evidently, when a user experiences UI/UX friction, at least the agent should be interactive. That said, a scrolling modal web agent UI has been the go-to approach, that is, a little floating widget on top of the underlying application's UI. It comes with a major advantage: the agent application can be decoupled from the underlying, self-contained application.",[11,3138],{":width":3139,"alt":3140,"format":2636,"loading":16,"src":3141},"360","Depiction of a web agent application augmenting an underlying application in an isolated layer","/blog/a-gentle-introduction-to-ai-agents-for-the-web/7.svg",[51,3143,3145],{"id":3144},"how-to-build-a-web-agent","How to Build a Web Agent?",[19,3147,3148],{},"Believe it or not: enhancing an existing web application with a purposeful agent is a lower-hanging fruit. The evolving agent ecosystem provides you with a spectrum of solutions: instantly use a pre-compiled agent, tweak a templated agent, or develop an agent from scratch. Either way, LLMs and web browsers exist for reuse, boiling down agent development to LLM context engineering, and UI augmentation.",[74,3150,3152],{"id":3151},"develop-a-web-agent","Develop a Web Agent",[19,3154,3155,3156,3159,3160,1018,3165,3170],{},"Opting for a ",[86,3157,3158],{},"pre-compiled agent"," does not necessarily involve any actual development step. Instead, pre-compiled agents allow for high-level configuration through an agent-as-a-service provider's interface. Popular agent-as-a-service providers are, i.a., ",[745,3161,3164],{"href":3162,"rel":3163},"https://elevenlabs.io/conversational-ai",[838],"ElevenLabs",[745,3166,3169],{"href":3167,"rel":3168},"https://www.intercom.com/drlp/ai-agent",[838],"Intercom",". Serviced agents hide LLM communication and potentially interaction with a web browser behind the configuration interface.",[19,3172,3173,3174,3177,3178,3183,3184,3189,3190,3195],{},"Using a ",[86,3175,3176],{},"templated agent"," resembles the agent-as-a-service approach on a lower level. Openly sourced from a ",[745,3179,3182],{"href":3180,"rel":3181},"https://github.com/webfuse-com/agent-extension-blueprint",[838],"code repository",", templated agents allow for any kind of development tweaks. Favourably, agent templates shortcut integration with ",[745,3185,3188],{"href":3186,"rel":3187},"https://openai.com/api/",[838],"LLM APIs"," and web ",[745,3191,3194],{"href":3192,"rel":3193},"https://developer.mozilla.org/en-US/docs/Web/API",[838],"browser APIs",". Using a templated agent usually represents the preferable, best-of-both-worlds approach; common- and best-practice code snippets are available from the beginning, but everything can be customised as desired.",[19,3197,3198,3199,3202],{},"Of course, developing an ",[86,3200,3201],{},"agent from scratch"," is always an option. It is preferable whenever agent requirements deviate to a large extent from what exists in the service or template landscape.",[74,3204,3206],{"id":3205},"deploy-a-web-agent","Deploy a Web Agent",[19,3208,3209,3210,1051,3215,3220,3221,3226,3227,3232,3233,3238,3239,3244],{},"When web agent code lives side-by-side with the augmented application's code, agent deployment is covered by a generic pipeline. Something like: ",[745,3211,3214],{"href":3212,"rel":3213},"https://eslint.org",[838],"linting",[745,3216,3219],{"href":3217,"rel":3218},"https://prettier.io",[838],"formatting"," agent code, ",[745,3222,3225],{"href":3223,"rel":3224},"https://esbuild.github.io",[838],"transpiling and bundling"," agent modules, ",[745,3228,3231],{"href":3229,"rel":3230},"https://www.cypress.io",[838],"testing"," agent, ",[745,3234,3237],{"href":3235,"rel":3236},"https://pages.cloudflare.com",[838],"hosting"," agent bundle, and ",[745,3240,3243],{"href":3241,"rel":3242},"https://docs.github.com/en/actions/get-started/continuous-integration",[838],"tiggering"," post deployment events. In that case, an agent represents a modular feature component in the application, no different than, for instance, a sign-up component.",[19,3246,3247],{},"Web agent source code right inside the application codebase comes at a cost:",[29,3249,3250,3253,3256],{},[32,3251,3252],{},"Agent developers can manipulate the source code of the underlying application.",[32,3254,3255],{},"Agent functionality could introduce side effects on the underlying application.",[32,3257,3258],{},"Agent changes require deployment of the entire application.",[74,3260,3262],{"id":3261},"best-practices-of-agentic-ux","Best Practices of Agentic UX",[19,3264,3265],{},"When designing user experiences for agent-enhanced applications, there are a few things to consider:",[29,3267,3268,3269,3268,3278,3268,3286],{},"\n    ",[32,3270,3271,3272,3271,3275,3277],{},"\n        ",[86,3273,3274],{},"Stream input and output to reduce latency",[2490,3276],{},"\n        LLMs (re-)introduce noticeable communication round-trip time. To reduce wait time for the human user, stream chunks of data whenever they are available.\n    ",[32,3279,3271,3280,3271,3283,3285],{},[86,3281,3282],{},"Provide fine-grained feedback to bridge high-latency",[2490,3284],{},"\n        Human attention is sensitive to several seconds of [system response time](https://www.nngroup.com/articles/response-times-3-important-limits/). Periodically provide agent _thoughts_ as feedback to perceptibly break down round-trip time.\n    ",[32,3287,3271,3288,3271,3291,3293],{},[86,3289,3290],{},"Always prompt the human user for consent to perform critical actions",[2490,3292],{},"\n        Some actions in a web application lead to irreversible or significant changes of state. Never have the agent perform such actions on behalf of the user without explicitly asking for the permission.\n    ",[74,3295,3297],{"id":3296},"non-invasive-web-agents-with-webfuse","Non-Invasive Web Agents with Webfuse",[19,3299,3300,3305],{},[745,3301,3303],{"href":2180,"rel":3302},[838],[86,3304,653],{}," is a configurable web proxy that lets you augment any web application. As pictured, web agents represent highly self-contained applications. Moreover, web agents and underlying applications communicate at runtime in the client. This does, in fact, render opportunities to bridge the above-mentioned drawbacks with Webfuse: Develop web agents with a sandbox extension methodology, and deploy them through the low-latency proxy layer. On demand, seamlessly serve users with your agent-enhanced website. Benefit from information hiding, safe code, and fewer deployments.",[3307,3308],"article-signup-cta",{":demoAction":3309,"heading":3310,"subtitle":3311},"{\"text\":\"Read more\",\"showIcon\":false,\"href\":\"https://www.webfuse.com/blog/category/ai-agents\"}","Deploy Web Agents with Webfuse","Develop or deploy web agents in minutes; serve agent-enhanced websites through an isolated application layer.",[2563,3313,3314],{},"html pre.shiki code .scGhl, html code.shiki .scGhl{--shiki-default:#7C7F93;--shiki-dark:#D6DEEB}html pre.shiki code .srFR9, html code.shiki .srFR9{--shiki-default:#7C7F93;--shiki-dark:#7FDBCA}html pre.shiki code .s30W1, html code.shiki .s30W1{--shiki-default:#1E66F5;--shiki-dark:#7FDBCA}html pre.shiki code .sbuKk, html code.shiki .sbuKk{--shiki-default:#40A02B;--shiki-dark:#D9F5DD}html pre.shiki code .sCC8C, html code.shiki .sCC8C{--shiki-default:#40A02B;--shiki-dark:#C789D6}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s76yb, html code.shiki .s76yb{--shiki-default:#8839EF;--shiki-dark:#C792EA}html pre.shiki code .sXbZB, html code.shiki .sXbZB{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .s2kId, html code.shiki .s2kId{--shiki-default:#4C4F69;--shiki-dark:#D6DEEB}html pre.shiki code .s9rnR, html code.shiki .s9rnR{--shiki-default:#179299;--shiki-dark:#7FDBCA}html pre.shiki code .scrte, html code.shiki .scrte{--shiki-default:#8839EF;--shiki-dark:#C5E478}html pre.shiki code .sgAC-, html code.shiki .sgAC-{--shiki-default:#40A02B;--shiki-default-font-style:italic;--shiki-dark:#ECC48D;--shiki-dark-font-style:inherit}",{"title":760,"searchDepth":761,"depth":761,"links":3316},[3317,3322,3328],{"id":2624,"depth":761,"text":2590,"children":3318},[3319,3320,3321],{"id":2640,"depth":766,"text":2641},{"id":2662,"depth":766,"text":2663},{"id":2680,"depth":766,"text":2681},{"id":2701,"depth":761,"text":2702,"children":3323},[3324,3325,3326,3327],{"id":2713,"depth":766,"text":2714},{"id":2746,"depth":766,"text":2747},{"id":2768,"depth":766,"text":2769},{"id":3132,"depth":766,"text":3133},{"id":3144,"depth":761,"text":3145,"children":3329},[3330,3331,3332,3333],{"id":3151,"depth":766,"text":3152},{"id":3205,"depth":766,"text":3206},{"id":3261,"depth":766,"text":3262},{"id":3296,"depth":766,"text":3297},"2025-06-15","LLMs only recently enabled serviceable web agents: autonomous systems that browse web on behalf of a human. Get started with fundamental methodology, key design challenges, and technological opportunities.",{"homepage":799,"relatedLinks":3337},[3338,3339,3343],{"text":2586,"href":2587,"description":2588},{"text":3340,"href":3341,"description":3342},"Develop an AI Agent for Any Website with Webfuse","/blog/develop-an-ai-agent-for-any-website-with-webfuse","Learn how to develop and deploy a web agent for any website with Webfuse",{"text":2176,"href":3344,"external":799,"description":2594},"https://dev.webfuse.com/automation-api/",{"title":2605,"description":3335},{"loc":1918},"blog/1011.a-gentle-introduction-to-ai-agents-for-the-web",[814,815,2600,817,818],"NE1cc8w1586RjefKyr028dgV7yBmf460jhZy91LninA",1775834759364]