UBlock vs. ABP: efficiency compared

pilooch · on March 13, 2016

Since deep learning is dominating the news these days, this might be the opportunity to point to the potential new ways of dealing with ad blocking. One is using character-based convolutional (or recurrent) networks that read the URL characters and classify it as legit or ad.

The interesting gain here is that there's no more conflicting regex nor optimization to work on huge lists of blocking rules. Instead, URLs can be passed to the network in batches.

I have conducted my own set of experiments and a usable deep model is available (see http://www.deepdetect.com/applications/text_model/ ).

This is kinda shameful plug, but I'm really interested in feedback on these new ways of ad blocking with high accuracy.

For sure running a deep net service is not as easy as installing ublock but there are ways. The whole source code is Open Source, as is the model. I have more data at hand, and some larger models could be built. Performance assessment would be a good next step as well.

EDIT: the linked page is rather long, lookup 'Novel task' to get to a quick classification example

Pharaoh2 · on March 13, 2016

Part of the requirement of adblock software is to inject itself into the request pipeline in realtime and reject ad requests while at the same time causing as small a stall as possible and also using the least amount of resources. NN's running inside a browser don't satisfy any of those criterias.

pilooch · on March 13, 2016

Not inside a browser of course. If you consider enterprise-wide applications, NN would be better and possibly faster (with GPU) than a bluecoat proxy.

Pharaoh2 · on March 13, 2016

I am still unconvinced if NN provide any advantage over regular expressions in this domain since it's not a hard problem to solve. Also, the request needs to be rejected before the connection is made so the only data to work with is the HTTP request headers.

On an enterprise level, a better challenge to solve with NN would be to create a asic accelerated neural net hardware to filter packets/connections of IPS/IDS and firewall purpose with an extra ability of also possibly blocking advert.

argonaut · on March 13, 2016

A URL character based classifier would be very very easy to game.

Perhaps you meant an image-based classifier, though in this case Pharoah2's objection about speed is an insurmountable issue even on an enterprise server, since classifying a website on a CPU would take many seconds; on a GPU it would probably be several hundred milliseconds, which is a few hundred milliseconds too many!

pilooch · on March 13, 2016

Performance is in ms, see http://caffe.berkeleyvision.org/performance_hardware.html Lookup the testing numbers, 500 images / sec. Character-based models are way simpler, you can use test very easily using the original link to the model.

They are not easily gamed either: they learn features as convolutional filters over the characters. These filters are more powerful than ngrams, as complex as the filters you may have seen for images.

Now, about gaming the nets, there's an issue, which is the same as for image-based CNNs: some combinations of letters (or pixel) do exist that do not make a difference to the eye, but push the classifier into the wrong class. However, in order to find these combinations, having a first-hand on the underlying neural net and its weights is mandatory.

argonaut · on March 14, 2016

The performance numbers are not representative of your proposed workload. They are 1) images either from RAM or sequentially on disk that are 2) batched and tested together with vector operations all the way (memory bandwidth also dominates the latency for this reason), and 3) the images are only 256 x 256 pixels (much smaller than a website).

Also, the neural net occupies something like 1.2 gigs of RAM, and something like 3 gigs of GPU memory.

pilooch · on March 14, 2016

> much smaller than a website

The character-based net reads the URL, not the website. The common alphabet used in the model made public is of size 69, URL length can be for example 256.

Not only this is below the image size you are talking about, but after the first convolution the alphabet (in one-hot vector encoding) is collapsed, which leaves convolutions across length 256 in 1D.

Not that terrible. In proxy-like territory, passing batches of URLs to the GPUs could pave the way to new ad-blockers at large. To be tested :)

argonaut · on March 18, 2016

I was referring to the hypothetical image classifier, not the URL text classifier (the URL text classifier, as I mentioned, would perform very poorly).

colejohnson66 · on March 13, 2016

Would crowd sourcing help? Like, what if when you block an element on a page, it asks if what you're blocking is an ad (or something else "no one" wants to see). If they click yes, it sends that URL or whatever to the neural net servers. If enough people block that element, it'll get blocked.

pilooch · on March 13, 2016

Possibly. Having millions of URLs to populate each class is a good thing to start with. Gathered through other means, our current dataset has around 10M URLs in the 'ads' category. The model we made available to the public was built from 2M of these URLs.

EDIT: of possible interest is that these models output a probability and possibly a confidence of having a URL blocked. Base on these, an blocker could ask for confirmation.

barrkel · on March 13, 2016

There are already commercial products that use NN and other ML techniques to classify and block malware, click fraud etc. - Cisco has one. It's certainly a valid and viable approach, even in the face of determined opposition.

a_imho · on March 13, 2016

Implementation aside, I agree with your first sentence. It could also create a real incentive to try gaming NNs, that might help to further their capabilities.

newman314 · on March 14, 2016

You should get in touch with gorhill to see if he would be interested integrating this into ublock...

supersan · on March 13, 2016

Apart from efficiency the most important difference between these two is that Adblock has a default Whitelist which allows certain type of ads to pass (i.e. by charging advertisers like Google and Taboola millions of dollars to unblock their ads).

[1] http://www.businessinsider.com/google-microsoft-amazon-taboo...

mschuster91 · on March 13, 2016

While ABPs business model is ethically questionable, it is a win-win for all involved parties:

1) end users get spared from the worst bunch of ads (layer ads, huge screen-filling ads in the left/right margins between content and screen, auto-playing video ads, popup/popunder ads, ads that mess with your browsing history, pre-roll ads on video sites)

2) site owners don't have their revenue stream completely fucked with more and more people using ad blockers

3) advertisers, provided they play by the rules, still have a way to get their content out to users

4) ABP/Eyeo has financial resources for development, hosting, maintenance and is able to keep up in the whack-a-mole game with the nasty parts of the ad distribution networks

manigandham · on March 13, 2016

It sounds nice in theory but ABP charges an outrageous 30% and allows anyone who pays, regardless of the quality or performance of their ads.

You can look at their whitelist and find all sorts of intrusive and shady ad networks in there. This kind of funding model is bound to incentivize ABP into taking the wrong actions and working with bad actors.

rmc · on March 13, 2016

OTOH your data is still being sent to the USA for snooping and spying.

hrvbr · on March 13, 2016

And reversely uBlock Origin doesn't block ads on Twitter and Tumblr, which makes it much less useful to me than Adblock Plus.

sccxy · on March 13, 2016

I don't see any ads on these sites using uBlock Origin. Maybe it is your problem?

Time to update filter lists.

anonymousab · on March 13, 2016

Update your lists.

Perhaps the auto update of the extension depends on the update settings of your browser.

gnodar · on March 13, 2016

Here is the developer of uBlock Origin, telling me how to block Twitter specific requests: https://news.ycombinator.com/item?id=11141600

muppetman · on March 13, 2016

uBlock Origin blocks nothing but what YOU tell it to. The problem isn't uBlock.

personjerry · on March 13, 2016

Would there be a difference between the UBlock mentioned here and the UBlock Origin that's notably separate?

bdz · on March 13, 2016

Yes. This is Ublock Origin.

The other one is simply Ublock

https://github.com/chrisaljoudi/ublock

personjerry · on March 13, 2016

Ah, the article itself didn't say origin anywhere. Thank you.

diziet · on March 13, 2016

I believe the uBlock mentioned here is uBlock Origin.

wldcordeiro · on March 13, 2016

There was some kind of schism between the original creator and someone who was intended to take on the project and the result is two extensions with the original changing its name.

yeukhon · on March 14, 2016

Till this date I have to be honest - I have to google the difference and still I constantly question who is the real one people recommend.... sigh.

Nexxxeh · on March 13, 2016

uBlock Origin is also available on Firefox for Android. It's by far my favourite mobile browsing combo.

noir_lord · on March 13, 2016

Agreed, Firefox for Android isn't quite as slick as Chrome (yet) but not having to deal with obnoxious full page adds with tiny tap targets is a massive win.

elinchrome · on March 14, 2016

I use adguard vpn to add an extra layer of ad blocking on my phone. Also I block ads at the router at home.

driverdan · on March 13, 2016

This is from a year ago. It's most likely out of date.

yAnonymous · on March 14, 2016

Yes. uBlock Origin wins by an even bigger margin now, because it's being optimized further while the folks at Adblock are busy counting money.

sysret · on March 13, 2016

Funny that they are using HN for an exemplary web page for their benchmarks.

HN does not require JavaScript.

Even without using an ad blocker, if the user disables JavaScript, a large percentage of "ad tech" will not work. And ads will not be served.

Then the game becomes how to get the user to turn on JavaScript for some other functionality.

mobiuscog · on March 14, 2016

Mostly, the 'game' is the fact that the site just doesn't work without Javascript.

HN is a very nice exception of keeping things simple.

netheril96 · on March 13, 2016

So, what is the cause of this improved efficiency?

alimbada · on March 13, 2016

I use the original AdBlock for Chrome (not ABP). One of its features that I find indispensable is the ability to sync custom filters, filter lists and settings via Dropbox as I use Chrome on at least 3 different machines. I'd be willing to switch to uBlock Origin if it had a sync feature.

agildehaus · on March 13, 2016

https://github.com/gorhill/uBlock/wiki/Cloud-storage

It does.

a_imho · on March 13, 2016

Efficiency as in resource usage, not in blocked items.

alimbada · on March 13, 2016

Also, not in terms of speed; ad blockers tend to slow down web page rendering especially on pages where there are a lot of ads.

ywecur · on March 13, 2016

* UBlock Origin

XzetaU8 · on March 13, 2016

A bit more recent comparison

"10 Ad Blocking Extensions Tested for Performance" https://news.ycombinator.com/item?id=10127971

gumby · on March 13, 2016

Why is blocking as a browser extension common but blocking with a proxy not? I use a proxy[] which means I get ad blocking in my mail client, RSS reader etc. Is a proxy too much work?

[] I use glimmerblocker. It's OK, the biggest "problem" is that I had to do a lot of tuning -- I think it's just one developer. And I had to write some code to get it to support HTTPS traffic (basically my proxy on my own machine had to perform a MITM "attack" on my behalf).

_w8sy · on March 14, 2016

Because there is more friction in setting up an ad blocking proxy, secondly your usage patterns differ from the normal where a majority are almost completely browser dependent

anonymousab · on March 13, 2016

It would be interesting to see if ABP has improved since this benchmark was done.

They seemed to be making honest strides to improve it.

kirk21 · on March 13, 2016

UBlock breaks Youtube for me.

agumonkey · on March 13, 2016

How so ? I didn't notice a single issue (archlinux chromium).

sdfin · on March 13, 2016

With UBlock Origin, I can see Youtube without problem.

akhilcacharya · on March 13, 2016

I've started having this problem as well. YouTube videos with ads return an error for about a minute, then start. I've started just disabling it on YouTube.

SimeVidas · on March 13, 2016

What are the findings?

d0100 · on March 13, 2016

UBlock Origin might be faster, but after I installed it, several websites lost their css formating... ABP just works well enough.

degenerate · on March 13, 2016

I've used UO for over a year and never had a single css file get removed. Any specific sites you mind mentioning?

hackuser · on March 13, 2016

CSS files often come from different hosts than the original site, usually CDNs or other static content hosts. Are you sure you enabled access to those hosts?

What you describe is exactly what I'd expect to see, and what I've seen with other filtering extensions, if the static hosts weren't enabled.

known · on March 13, 2016

http://www.alternate-dns.com/ can filter ads at dns level

icebraining · on March 13, 2016

Their FAQ fails to answer the basic questions - who runs it, how is it funded and why should it be trusted?

Your DNS server can log the domains you request and can serve malicious replies (particularly bad for HTTP and other non-authenticated protocols). Trusting some random server is not the smartest decision.

diziet · on March 13, 2016

I would go with a hosts file solution as it is something you can control yourself in case you need to.

SquareWheel · on March 13, 2016

It solves a lot of efficiency problems, but is hell for debugging a broken webpage. For that reason I feel the browser extension justifies its footprint.

rbjorklin · on March 13, 2016

http://someonewhocares.org/hosts/