Skip to main content

Folks at Tollbit have released an interesting study on AI’s impact on publishers (the impact of LLMs’). (www.tollbit.com)

Key Takeaways ( AI summarized ) of the AI’s impact on publishers:

  • The latest AI agents and “headless browsers” interact with websites as if they were human users, masking their true nature with standard Chrome user agents and solving CAPTCHAs.
  • AI bot traffic now exceeds that of Bing’s web crawler; 1 in every 50 website visitors is an AI agent, up from 1 in 200 at the year’s start—a 4x surge in relative AI visits since Q1.
  • Human visitor numbers fell 9.4% across TollBit partner sites, while AI bot or agent traffic soared, highlighting a substitution effect where AIs, not humans, are browsing sites.
  • Publisher responses include a 336% rise in sites blocking or redirecting AI bots (HTTP errors and use of robot paywalls), yet 13.26% of AI bots still bypass robots.txt—up from 3.3% just six months prior.
  • Third-party and browser-driven AI agents are rapidly being adopted, further blurring the human/bot line; they automate browsing for real-time tasks including purchasing and navigation.
  • Advanced agents operate from the user’s desktop IP, making detection by publishers even harder; these “faux human” visitors risk eroding the web’s economic foundations.
  • The report contends that AI agents must be required to self-identify, and regulatory action should mandate clear user agent disclosures to protect publisher interests and foster a fair content market.
  • AI scraping puts new cost pressures on sites, including rising CDN bills; many categories, such as B2B/professional, parenting, and sports, receive disproportionately high AI traffic.
  • Europe’s regulatory actions have resulted in AI bots scraping European sites 27% less often than US sites, while APAC domains are the most heavily scraped globally.
  • Referral traffic from AI applications remains marginal—AI sends only 0.1% of all referrals to publishers, and Google’s dominance is waning; Google referrals fell to 84% of all external visitors (down from 90% last year).
  • AI bots’ caching strategies vary: ChatGPT caches web content for around 30 minutes, Gemini for 15 minutes (at the user level), and Claude caches results for over 16 days.
  • Publishers are increasingly deploying security and paid-access (“Bot Paywall”) responses, yet bots evolve to evade or bypass these efforts.
  • Headless browsing and the need for new agent-to-web protocols are discussed as potential future technical and regulatory directions.
  • The report advocates for policy reforms to require autonomous bots to identify themselves via HTTP headers, closing the loophole allowing automation to masquerade as human traffic.
  • As AI-mediated web interactions grow in sophistication and share, the urgency of distinguishing between real and “faux human” site visits escalates for publishers, advertisers, and regulators alike.”