Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
UBlock vs. ABP: efficiency compared (github.com/gorhill)
193 points by diziet on March 13, 2016 | hide | past | favorite | 59 comments


Since deep learning is dominating the news these days, this might be the opportunity to point to the potential new ways of dealing with ad blocking. One is using character-based convolutional (or recurrent) networks that read the URL characters and classify it as legit or ad.

The interesting gain here is that there's no more conflicting regex nor optimization to work on huge lists of blocking rules. Instead, URLs can be passed to the network in batches.

I have conducted my own set of experiments and a usable deep model is available (see http://www.deepdetect.com/applications/text_model/ ).

This is kinda shameful plug, but I'm really interested in feedback on these new ways of ad blocking with high accuracy.

For sure running a deep net service is not as easy as installing ublock but there are ways. The whole source code is Open Source, as is the model. I have more data at hand, and some larger models could be built. Performance assessment would be a good next step as well.

EDIT: the linked page is rather long, lookup 'Novel task' to get to a quick classification example


Part of the requirement of adblock software is to inject itself into the request pipeline in realtime and reject ad requests while at the same time causing as small a stall as possible and also using the least amount of resources. NN's running inside a browser don't satisfy any of those criterias.


Not inside a browser of course. If you consider enterprise-wide applications, NN would be better and possibly faster (with GPU) than a bluecoat proxy.


I am still unconvinced if NN provide any advantage over regular expressions in this domain since it's not a hard problem to solve. Also, the request needs to be rejected before the connection is made so the only data to work with is the HTTP request headers.

On an enterprise level, a better challenge to solve with NN would be to create a asic accelerated neural net hardware to filter packets/connections of IPS/IDS and firewall purpose with an extra ability of also possibly blocking advert.


A URL character based classifier would be very very easy to game.

Perhaps you meant an image-based classifier, though in this case Pharoah2's objection about speed is an insurmountable issue even on an enterprise server, since classifying a website on a CPU would take many seconds; on a GPU it would probably be several hundred milliseconds, which is a few hundred milliseconds too many!


Performance is in ms, see http://caffe.berkeleyvision.org/performance_hardware.html Lookup the testing numbers, 500 images / sec. Character-based models are way simpler, you can use test very easily using the original link to the model.

They are not easily gamed either: they learn features as convolutional filters over the characters. These filters are more powerful than ngrams, as complex as the filters you may have seen for images.

Now, about gaming the nets, there's an issue, which is the same as for image-based CNNs: some combinations of letters (or pixel) do exist that do not make a difference to the eye, but push the classifier into the wrong class. However, in order to find these combinations, having a first-hand on the underlying neural net and its weights is mandatory.


The performance numbers are not representative of your proposed workload. They are 1) images either from RAM or sequentially on disk that are 2) batched and tested together with vector operations all the way (memory bandwidth also dominates the latency for this reason), and 3) the images are only 256 x 256 pixels (much smaller than a website).

Also, the neural net occupies something like 1.2 gigs of RAM, and something like 3 gigs of GPU memory.


> much smaller than a website

The character-based net reads the URL, not the website. The common alphabet used in the model made public is of size 69, URL length can be for example 256.

Not only this is below the image size you are talking about, but after the first convolution the alphabet (in one-hot vector encoding) is collapsed, which leaves convolutions across length 256 in 1D.

Not that terrible. In proxy-like territory, passing batches of URLs to the GPUs could pave the way to new ad-blockers at large. To be tested :)


I was referring to the hypothetical image classifier, not the URL text classifier (the URL text classifier, as I mentioned, would perform very poorly).


Would crowd sourcing help? Like, what if when you block an element on a page, it asks if what you're blocking is an ad (or something else "no one" wants to see). If they click yes, it sends that URL or whatever to the neural net servers. If enough people block that element, it'll get blocked.


Possibly. Having millions of URLs to populate each class is a good thing to start with. Gathered through other means, our current dataset has around 10M URLs in the 'ads' category. The model we made available to the public was built from 2M of these URLs.

EDIT: of possible interest is that these models output a probability and possibly a confidence of having a URL blocked. Base on these, an blocker could ask for confirmation.


There are already commercial products that use NN and other ML techniques to classify and block malware, click fraud etc. - Cisco has one. It's certainly a valid and viable approach, even in the face of determined opposition.


Implementation aside, I agree with your first sentence. It could also create a real incentive to try gaming NNs, that might help to further their capabilities.


You should get in touch with gorhill to see if he would be interested integrating this into ublock...


Apart from efficiency the most important difference between these two is that Adblock has a default Whitelist which allows certain type of ads to pass (i.e. by charging advertisers like Google and Taboola millions of dollars to unblock their ads).

[1] http://www.businessinsider.com/google-microsoft-amazon-taboo...


While ABPs business model is ethically questionable, it is a win-win for all involved parties:

1) end users get spared from the worst bunch of ads (layer ads, huge screen-filling ads in the left/right margins between content and screen, auto-playing video ads, popup/popunder ads, ads that mess with your browsing history, pre-roll ads on video sites)

2) site owners don't have their revenue stream completely fucked with more and more people using ad blockers

3) advertisers, provided they play by the rules, still have a way to get their content out to users

4) ABP/Eyeo has financial resources for development, hosting, maintenance and is able to keep up in the whack-a-mole game with the nasty parts of the ad distribution networks


It sounds nice in theory but ABP charges an outrageous 30% and allows anyone who pays, regardless of the quality or performance of their ads.

You can look at their whitelist and find all sorts of intrusive and shady ad networks in there. This kind of funding model is bound to incentivize ABP into taking the wrong actions and working with bad actors.


OTOH your data is still being sent to the USA for snooping and spying.


And reversely uBlock Origin doesn't block ads on Twitter and Tumblr, which makes it much less useful to me than Adblock Plus.


I don't see any ads on these sites using uBlock Origin. Maybe it is your problem?

Time to update filter lists.


Update your lists.

Perhaps the auto update of the extension depends on the update settings of your browser.


Here is the developer of uBlock Origin, telling me how to block Twitter specific requests: https://news.ycombinator.com/item?id=11141600


uBlock Origin blocks nothing but what YOU tell it to. The problem isn't uBlock.


Would there be a difference between the UBlock mentioned here and the UBlock Origin that's notably separate?


Yes. This is Ublock Origin.

The other one is simply Ublock

https://github.com/chrisaljoudi/ublock


Ah, the article itself didn't say origin anywhere. Thank you.


I believe the uBlock mentioned here is uBlock Origin.


There was some kind of schism between the original creator and someone who was intended to take on the project and the result is two extensions with the original changing its name.


Till this date I have to be honest - I have to google the difference and still I constantly question who is the real one people recommend.... sigh.


uBlock Origin is also available on Firefox for Android. It's by far my favourite mobile browsing combo.


Agreed, Firefox for Android isn't quite as slick as Chrome (yet) but not having to deal with obnoxious full page adds with tiny tap targets is a massive win.


I use adguard vpn to add an extra layer of ad blocking on my phone. Also I block ads at the router at home.


This is from a year ago. It's most likely out of date.


Yes. uBlock Origin wins by an even bigger margin now, because it's being optimized further while the folks at Adblock are busy counting money.


Funny that they are using HN for an exemplary web page for their benchmarks.

HN does not require JavaScript.

Even without using an ad blocker, if the user disables JavaScript, a large percentage of "ad tech" will not work. And ads will not be served.

Then the game becomes how to get the user to turn on JavaScript for some other functionality.


Mostly, the 'game' is the fact that the site just doesn't work without Javascript.

HN is a very nice exception of keeping things simple.


So, what is the cause of this improved efficiency?


I use the original AdBlock for Chrome (not ABP). One of its features that I find indispensable is the ability to sync custom filters, filter lists and settings via Dropbox as I use Chrome on at least 3 different machines. I'd be willing to switch to uBlock Origin if it had a sync feature.



Efficiency as in resource usage, not in blocked items.


Also, not in terms of speed; ad blockers tend to slow down web page rendering especially on pages where there are a lot of ads.


* UBlock Origin


A bit more recent comparison

"10 Ad Blocking Extensions Tested for Performance" https://news.ycombinator.com/item?id=10127971


Why is blocking as a browser extension common but blocking with a proxy not? I use a proxy[] which means I get ad blocking in my mail client, RSS reader etc. Is a proxy too much work?

[] I use glimmerblocker. It's OK, the biggest "problem" is that I had to do a lot of tuning -- I think it's just one developer. And I had to write some code to get it to support HTTPS traffic (basically my proxy on my own machine had to perform a MITM "attack" on my behalf).


Because there is more friction in setting up an ad blocking proxy, secondly your usage patterns differ from the normal where a majority are almost completely browser dependent


It would be interesting to see if ABP has improved since this benchmark was done.

They seemed to be making honest strides to improve it.


UBlock breaks Youtube for me.


How so ? I didn't notice a single issue (archlinux chromium).


With UBlock Origin, I can see Youtube without problem.


I've started having this problem as well. YouTube videos with ads return an error for about a minute, then start. I've started just disabling it on YouTube.


What are the findings?


UBlock Origin might be faster, but after I installed it, several websites lost their css formating... ABP just works well enough.


I've used UO for over a year and never had a single css file get removed. Any specific sites you mind mentioning?


CSS files often come from different hosts than the original site, usually CDNs or other static content hosts. Are you sure you enabled access to those hosts?

What you describe is exactly what I'd expect to see, and what I've seen with other filtering extensions, if the static hosts weren't enabled.


http://www.alternate-dns.com/ can filter ads at dns level


Their FAQ fails to answer the basic questions - who runs it, how is it funded and why should it be trusted?

Your DNS server can log the domains you request and can serve malicious replies (particularly bad for HTTP and other non-authenticated protocols). Trusting some random server is not the smartest decision.


I would go with a hosts file solution as it is something you can control yourself in case you need to.


It solves a lot of efficiency problems, but is hell for debugging a broken webpage. For that reason I feel the browser extension justifies its footprint.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: