Why do URL-based ad blockers work?

Sep 2, 2020

Disclaimer: I work for Google but not on any of the ads teams. This is a personal post.

When Pete Snyder filed WICG/webpackage#551 that Web Bundles might break ad blockers, I had to figure out what makes those ad blockers work in the first place.

Now, obviously, if a page loads an ad from a URL, then blocking that URL will block the ad. But the web is an evolving system, and ad blockers are in an adversarial relationship with publishers, advertisers, and ad-tech companies who all want to make sure users see their ads. Those ad folks are smart and capable of finding ways around naïve attempts to block their ads. So what prevents them from avoiding that list of URLs? Why do URL-based ad blockers keep working?

This post primarily tries to answer that question, not to answer Pete’s concern, but it does eventually come back to web bundles’ effect on ad blocking: they don’t really affect any of the reasons sites haven’t pursued an arms race with ad blockers.

Sites that don’t try to evade §

A user who has installed an ad blocker has sent a pretty clear signal that they don’t want to see ads. Advertisers and publishers may not want to risk angering such users by showing them ads anyway.

And even though around a quarter of all web users use ad blockers, that may still not be enough to pay a publisher to engage in an arms race with ad blockers.

Functionality that needs an online endpoint §

The far bigger reason that ad blockers keep working is that advertisements are usually fetched based on the results of an auction that runs as the surrounding page is downloaded. Whoever runs that auction accepts requests at some URL and responds with ads. That URL is a nice stable target for ad blockers. The auctioneer could dodge the blocker by changing the URL whenever it gets blocked, but then they have to find a way to update all of the publishers’ pages that were written to call that URL. That’s a big logistical problem.

The first thing the auctioneer might try is to have the publishers load a <script src="https://auctioneer.example/auctioneer.js"> that includes the dynamically updating auction endpoint. This is often known as an “ad tag” and is usually the way ads are served even when they’re not trying to avoid ad blockers. But, oops, now the ad blockers are blocking auctioneer.js, and the auctioneer is back to the original problem.

Obfuscate the URL §

It’s straightforward to obfuscate the URL for the auction endpoint, for example by encrypting it with the current date and even a key provided to the particular publisher. The auctioneer can decrypt the request on their server, and run the resulting auction. If the auctioneer isn’t careful, this will lead to their entire domain being blocked, but they might be lucky enough to run a popular website on the same domain, which ad-blocker users would be sad to lose access to. They’ll need to encrypt every resource on the server in the same way to avoid letting the URL-based blockers distinguish.

The bigger problem is that now they have some complicated code copied to every publisher’s page. If that code ever needs to be updated, it’s going to be a problem. And they can’t abstract it into an auctioneer.js for the same reason as before.

Proxy via the first-party server §

The auctioneer could also ask the host of each page to act as a proxy for either the auction request URL or the auctioneer.js posited above. The page would request /any_url_the_publisher_wants.js, and the server would forward that request to the auctioneer and reply with their response. Because of the number of different publishers, it would be difficult for an ad blocker to block all of the script names they picked, and a publisher that wanted to avoid ad blockers could be as creative as they like in rotating those names.

However, this is still more difficult for publishers to adopt than pasting an ad tag on their site, and that difficulty seems to have been enough to stop this technique from being widely adopted. Proxying too much would also make it hard for the auctioneer or advertiser to detect ad fraud, since ad fraud detection currently depends on inspecting connections directly to end-users.

Run a CDN §

The auctioneer could also offer to act as a CDN for publishers that want to avoid ad blockers. By proxying all of the publisher’s content, they can automatically rewrite the ad tags into randomized local references that an ad blocker can’t distinguish from the page’s actual subresources. However, the publisher can only do this with one auctioneer, and they need to trust that auctioneer to do a good job serving all the rest of their content.

What about first-party ads? §

A publisher that sells their own ads might not need to make a separate request for ad blockers to target. Instead, they have a choice between an easy-to-manage URL space with all the ad-related resources in a separate path that ad blockers can target, vs ads mixed indistinguishably among the site’s other resources. The second costs enough development and maintenance time that sites tend not to do it. However, some large sites have chosen to frequently rotate the paths of their ads resources to make it hard for URL-based blockers to keep up.

A first party could also inline ad-related resources into the page itself. Any necessary scripts and styles can be placed at the bottom of the page, and images can either be compiled into the scripts or included with data: URLs. This requires every page of the site to be served dynamically and loses any possible caching benefits from sharing ad resources between pages.

What about non-ad uses of ad blockers? §

It turns out that ad blockers are also used to block other intrusive things, like trackers (including social widgets), big downloads like fonts, fingerprinting scripts, and cryptocurrency miners. Trackers and cryptocurrency miners have to make a network request off the first-party origin in order to send their results, and the URL of that request has to be similarly stable to an ad auction, so ad blockers can block it.

Fingerprinting scripts, on the other hand, only need to report their result to the surrounding page, and some of them provide npm packages for trivial use in website bundlers (like webpack, Rollup, or Parcel). The fingerprinting script can even be bundled with some of the site’s shared code to ensure that it can be cached within the site while ensuring that blocking it will break the site. Ad blockers will only manage to block a fingerprinting script whose host isn’t trying to avoid the blocker.

Big files are easy to re-host locally, but usually aren’t worth the trouble.

How do web bundles affect this? §

Issue #551 claims that Web Bundles make it easier to avoid ad blockers, so how might they do that?

Uses that need an online endpoint will still need one whether or not they’re bundling their code. Ad blockers should continue to target that endpoint. The considerations that make it difficult to move that endpoint around outside a bundle also make it difficult to move it around using bundles.

Uses that only need to get a script to run are already defended by existing Javascript compilers: if a publisher doesn’t care enough about defeating ad blockers to run a compiler, there’s no reason to think they’ll care enough to build a web bundle either.

Bundles provide another way to inline first-party ads, with the improvement of not needing to use data: URLs for images. They come with the same downsides around needing to serve every page dynamically and losing the caching benefits of sharing ad-related resources between pages.

Acknowledgements §

Thanks to Jeff Kaufman, Justin Fagnani, and Michael Kleber for reviewing this post.

This was originally published on Medium.