One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place. So as we encourage people to be more privacy-aware, the web gets harder and harder to use.
We've seen ReCAPTCHA pop all over ecommerce, all over benign websites with little to no need to challenge use almost completely because of the increase in privacy-aware users.
ReCAPTCHA essentially flies in the face of the recent blocking features rolling into Safari and Firefox and more privacy-aware users...growing by the day.
In many ways it's a genius structure from Google.
1. Convince people to use your privacy challenge.
2. Serve it when you don't see Google tracking cookies.
3. Offer a way around that with the least privacy-aware browser available (Chrome use is growing steadily month over month.
If you blocked cookies or were otherwise problematic, it would sometimes lock you out of all ReCAPTCHA-gated resources not by giving you a message describing what was happening, why, and how to fix it, but rather by simply pretending that your every attempt to solve the captcha failed. Obviously this is extremely frustrating, by design, but it gets even more so with compounding factors like "the library is closed at this hour, so I can't get a fresh connection."
The worst I've seen has been when it happens to people who aren't well equipped to guess what's happening. When my friend's younger brother got hellbanned from his PlayStation account, he spent 30 minutes trying to identify traffic lights (or whatever) and then retreated crying to his room, because he wasn't able to deduce that Google was gaslighting him. He trusted Google. They had him convinced that he was such a failure he couldn't even identify traffic lights correctly, and he was -- quite reasonably -- inconsolable for a while.
Yeah, I should break down my methodology for arriving at the "hellban" conclusion.
If I get a bunch of failures in a row, I'll first try the refresh button built into the captcha, and then re-solve a number of times. Then I'll try re-loading the page and re-solving, then I'll try in a different browser with cleared state and re-solving, then I'll try a different device and re-solving, and finally I'll try a different connection, device, and cleared browser state and re-solving.
I'll consider something a hellban if I get persistent failures across several different challenge types but switching to a clean connection+device+state results in immediate success with the captcha.
Look, I get it, they can't be too explicit with the errors or they tip their hand to the botters and effectively give them a "to-do" list. Still, the gaslighting is persistent enough that there's just no way it's marginally beneficial all the way through. At some point, everyone figures it out: bots, techies, and normies. My guess is that they figure it out in this order, from quickest to slowest: smart bots, techies, normies, dumb bots. I'm not calling normies dumb here, they just don't have much background knowledge about the inner workings of captchas, so it takes longer. By that point, they're so far past the typical number of captcha attempts that only the very dumbest of bots, those without heuristics to detect this sort of thing, are going to be fooled along with them. Surely having the captcha tip its hand at this point -- which only gives an advantage to the dumbest of bots, because the smart bots figured it out long ago -- is the right thing to do.
Re:CAPTCHA has no mercy on the normies, and I really think they could do a lot better.
One thing I've found (after others mentioned it here) is that Google seems to reward impatience when trying to solve captchas. Going faster and making more mistakes and not waiting for loading images seems to help convice the algorithm that you are human. This is rough on anyone who thinks they are being rejected for not being accurate enough.
OTOH, it is hard to figure out for sure what makes a difference. I use a proxy/VPN with a fixed IP address that only I use and Google eventually seems to have figured it out; I used to get the hard or impossible ones on Google Scholar at times but now never do. So possibly in my case they decided to stop giving them to me around when I changed strategies, but I suggest giving it a try at least.
I usually intentionally get a few wrong to poison their learning data set. It doesn’t seem to impact the number of things I have to click on to get through.
I’m not sure what they’re measuring, but I doubt it has much to do with image recognition performance.
I just click stuff randomly and then hammer the submit button until the new images load. That seems to work even though I rarely tick the correct squares.
My new strategy is to just file support requests to any company using them, complaining that I did their test correctly but it still rejected me. My idea is quite simply to make reCaptcha unfeasibly expensive to use.
Why does the Deezer app installed on my desktop PC need a daily captcha?
That said, I use it myself on all of my companies' customer support forums to discourage people from sending me those pesky requests. In that sense, it's the new "please hold the line".
In any case, I'm glad that Google's motto is "don't be evil". That reassures me that using reCaptcha is morally acceptable ;)
Regularity of clicking is considered a sign of robot behavior, which is especially frustrating if you learned to perform repetitive image identification mouse tasks in a computer with rhythmic regularity (think Turk, for example).
I know back in the day for RuneScape bots using SCAR there were macros to move the mouse from one position to another on the screen with randomized acceleration, randomized curvature, overshoot, clicking in some bounding box, etc. all using a normal distribution in an effort to thwart detection. Imagine being the poor developer tasked with trying to recover some signal out of that.
Just alt-tabbed in from writing runescape bots to HN and wanted to say that's still the case (for the client I use). The code is pretty complicated now but still functions much like you say. Mouse position is tracked, then any input which repositions the mouse accelerates at a "reasonable rate" and with a randomized curvature to get "close enough" before it self-corrects and gets a pixel-perfect click.
A few years ago the client stopped sending mouse data back to Jagex altogether. Luckily, I don't think there's many poor developers tasked with trying to recover any signal out of that anymore. :)
Captchas are fundamentally anti-human. I'm not saying there isn't a problem to be solved, I'm saying Captchas are a behavior enforcement mechanism overseen by robots and are anti-human.
I write the site owner short note when they go bad explaining why they just lost a customer and go somewhere else. Life is too short to put up with shitty tech.
What, in your opinion, is the pro-human way to address the problem to be solved?
I'm always curious to hear what other approaches might be worth considering. CAPTCHAs tend to tick the boxes of performing well enough for website-controllers and being low-effort for them to deploy.
There's a lot of ground between "error messages precise enough to effectively give botters a to-do list" and "faking failures 100 times in a row." What was the marginal utility of the 99th fakeout? Are there really enough otherwise effective bots that get persistently tripped up by this particular fakeout to justify sending the poor kid crying to his room?
Almost certainly not. What really happened is that someone removed (or never added) user communication in order to maximize their score against botters and gave little thought to mitigating their false positives. Minimizing them, yes, mitigating them, no. "Humans are smart, they'll figure it out," they rationalized to themselves, and called it a day. They never bothered to calculate (or even guess) when the marginal utility of the fakeout dropped far enough to allow them to have mercy on the poor humans still caught in their web.
I have no suggestions for the general case, and suspect it is one of those problems that doesn't have general-purpose solution. That doesn't mean captchas don't suck.
As for specific things one can do, like anything, more effort means better results. I'm not going to talk about this much, but we do look at a lot of different behavioral and other signals for fraud detection, as that's an important aspect of our business.
If others are fine with annoying their customers to offload risk, they can make that call. I don't have much sympathy about lost sales, though - it is literally choosing to waste customers' time and increase frustration for one's own benefit.
A lot of CAPTCHAs protect things that are very cheap, but where they don't want it to be free. One solution would be to charge money, but people concerned about privacy won't want to give away conventional payment information.
So, perhaps a nominal payment in some reasonably anonymous cryptocurrency? Or even just participating in some proof-of-work problem that would cost a few cents worth of electricity?
That wouldn't stop really serious botnets or people with stolen credit cards, but those are also both illegal and should be shut down for other reasons.
You've made an assertion, not an argument. What does "anti-human" even mean? You're angry, sure, but you haven't expressed what exactly it is that you're angry about. Nor have you proposed a realistic alternative way to distinguish bots from humans. This kind of histrionic, sweeping hot take is not productive.
Considering captchas operate by pushing the work of avoiding bots on your site (your problem) onto all the human users of your site, I think on the basis of that alone "anti-human" is warranted. Or "anti-social", if you prefer, which might better capture the fundamental problem with that aspect of it. That they proceed to perform textbook gaslighting on some of those people makes it even worse ("no, you didn't select all the buses in those images" but, of course, you did). Whether these things are necessary for it to operate is beside the point.
Are movie theaters anti-human because they push the work of avoiding freeloaders (their problem) onto all human users of the theater by making them carry and show tickets?
I must have been hell banned in the past. It used to take 30 mins to log into humble bundle because of the endless stoplights and sidewalks, I buy a lot fewer bundles now since I’m still a little bitter.
Now I just deliberately give bad answers and get to “pass” the challenges... not sure why
How, in your opinion, should Google have handled the matter in a way that does not give spammers or other abusive users ways to get around the measure? Bear in mind that any such approach has to be scalable to many zeros daily, the vast majority of which will not be empathically awful cases like your brother's very real pain and distress - most will be genuinely abusive behavior.
I want to be clear that I am not attempting to minimize your brother's pain or emotional suffering. I'm hoping that there might be an approach that's kinder and more compassionate to him while still accomplishing the same goals.
> the vast majority of which will not be empathically awful
Yeah, most of the time it's "just" really, really obnoxious, not to mention coercive in a way that aligns with Google's interests.
Thanks, Google.
> How, in your opinion, should Google have handled the matter in a way that does not give spammers or other abusive users ways to get around the measure?
"Our anti-spam systems believe that you might be a robot. Your profile has been locked for (x) minutes. Sorry for the inconvenience. Go _here_ to learn tips & tricks for avoiding lockouts in the future." X gets exponentially ramped.
Note how vague the message is. It sacrifices the opportunity to tarpit a really dumb robot in exchange for not being awful to humans.
Based on ReCAPTCHA's design decisions, it's abundantly clear that eeking out every sliver of a percent of marginal efficacy is the priority over treating users humanely. That's why I have a problem with ReCAPTCHA.
In my opinion and experience, ReCAPTCHA isn't really, really obnoxious most of the time. I suspect that most of the time it trips up bots who have no emotional experiences whatsoever. Most of my personal encounters with it involve solving no puzzles whatsoever. With that in mind, I expect humans and their completely real reactions might not be the default case. Of course, this is speculative, as I do not have any kind of special data on the subject.
Thank you for sharing! Have you considered the possibility that presenting any message at all - especially one with a clear block time - is sending a very clear message to bot controllers? I'm sure you've considered this, and I am just failing to understand. Wouldn't that remove any real gains from being vague with tips & tricks?
Wouldn't there also be the real chance that vague tips & tricks would leave an actual human being in tears, convinced that they're just too dumb to understand them properly?
> I suspect that most of the time it trips up bots who have no emotional experiences whatsoever.
I'll bite: maybe it's good at identifying obedient drones and letting them through :)
It trips up the normies in my life often enough that I suspect being technically inclined is actually a net advantage because it makes you quick to detect the problem and quick to apply workarounds. Those advantages are significant enough to outweigh even the cost of the semi-regular dance where I try to protect myself and Google jerks my chain.
> Have you considered
The fact that I phrased my proposal as a tradeoff should have strongly hinted that I did, in fact, consider.
> Wouldn't that remove any real gains from being vague with tips & tricks?
One bit of information -- locked vs not -- is hardly the same as disclosing the inner workings, or even the information inputs, of the classifier, and smart botters have access to that bit of information anyway because they've built a gaslight detector by leveraging their legions of diverse bots and endless supply of dirt cheap human labor.
Gaslighting humans is really bad. A minimal courtesy would only cost a sliver of efficacy, and ReCAPTCHA still rejects it. That decision earns it the bad will directed its way.
> In my opinion and experience, ReCAPTCHA isn't really, really obnoxious most of the time.
Do you use any sort of privacy protection while browsing? I do a few simple things like browse in private mode by default, and ReCAPTCHA just cannot deal with it. It instantly brands my connections as a bot. It is obnoxious. Using private mode shouldn't ban you from the web. There's no reason that most web sites need to save data on my computer to identify me later.
I don't think Chrome has ever been my daily driver.
That said, I also expect to be treated with more suspicion when I behave more like a bot. So I'm neither surprised nor bothered when Firefox Private gets me an uptick in ReCAPTCHAs. I understand that this is a highly unusual expectation.
You're forgetting the main benefit for google, which is getting humans to train all their vision models for free. At one point they were just forcing X% of clicks to fill out a captcha regardless of origin or identity just to get more data.
I for one am getting quite tired of trillion dollar corporations getting things for free out of me. Hard pass.
> You're forgetting the main benefit for google, which is getting humans to train all their vision models for free.
Is this still true? I keep seeing the same type of images for years and there might be 7 or 8 different categories but that's it. To me reCaptcha looks like a service well in its maintenance phase. If it was actually in use for training purposes you might expect images to match a wider range of tasks.
I haven't gotten one of those in years. These days it's just picking out buses, cars, traffic signals, and sometimes motorcycles. Maybe once in a while it'll ask for storefronts.
Most of mine lately have been traffic features also. This is a little tricky in some cases, e.g. with crossings, as it sometimes gives me things that I don't think are crossings but it insists I select, perhaps they are in the US, or the perspective is weird, or someone else has told it that a series of white squares is a crossing and it requires me to agree.
Except in this wonderful new world, you don't get the choice to "hard pass". As someone whose ISP has too few public IP addresses, I see Cloudflare's "one more step" pages at least several times a month. It's terrifying to realize just how much of the internet is behind that thing right now.
This really shows how popular perceptions of Google have changed for the worse over the years. I remember when RECAPTCHA was first launched, everyone knew right away that it was just helping Google train their vision models, but at the time we all thought it was cool, like "Wow, I'm helping the cause of AI research at the same time as stopping spam". But now it just pisses everyone off.
Hell, for a little while Google had a game (can't remember the name of it) which was labeling images with another person to get points and people loved it.
At least the original reCAPTCHA was used for OCR'ing public domain books. Even if it had the effect of training Google's OCR tech, it was at least making literature searchable and indexable for the public good. Modern reCAPTCHA is nothing more than training for Google Maps and, seemingly, self-driving cars, both of which are commercialized.
I really don't think the challenges we're giving at still hard for computers.. a lot of these are super simple.. google would've cracked many of the driving ones years ago
If that was still the main benefit for them, they wouldn’t be planning to start charging for it, because that would—and, as this article shows, has—cut off much of that data flow, as reCAPTCHA clients abandon the service for another one that isn’t charging them.
Did you even RTFA and look at hCAPTCHA? hCAPTCHA couldn't be more grossly focused on neural-net training. Hell, one challenge asks you to draw a bounding box and another is a classification tagging.
There was no argument being made for HCAPTCHA in the post to which you replied. So, yeah, everything you mentioned is indeed gross, including Google's behavior.
One of the non-obvious consequences is that any system designed to use technical measures to distinguish between humans and computers will wind up very sensitive. There's an arms race, and us real users are caught in the middle.
There's a vast army of computers doing their best to pretend to be human. The whole point of any kind of CAPTCHA is to try to catch them out - and every measure gets worse over time. So companies like Google look at everything they can see that helps them distinguish typical humans from robots.
This has a nasty side-effect. A lot of measures intended to preserve privacy have the incidental effect of making the privacy-sensitive user look more like a computer and less like a human. Not saving cookies and not executing JS are classic bot moves. This plays directly into the sensitivity that has been engineered over time in order to catch more computers posing as humans.
I don't know any easy resolution to this tension. Maybe you do? I really hope so. The internet is overrun with abusive behavior and the amount of work that goes into keeping it at bay is staggering.
> One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place.
It is understandable and I expect HCAPTCHA to do the same thing.
The goal of a CAPTCHA is to identify you as a human. I don't know how ReCAPTCHA works, but I expect it to be like spam filters: they have a sample of bots, a sample of humans and assign weights to every aspect, in the end, the algorithm spits out a probability of you being human, and it will challenge you until it reaches a set value.
The thing is: if you hide everything for privacy reasons, you are making yourself indistinguishable from anything else using HTTP, including bots. That's the point, but it also means the only way to prove you are human is through a challenge.
Think of it like a private club. If you a regular and the bouncer is likely to recognize you and let you in without asking anything. But if you don't want to show your face, you will need to show your membership card every single time. That's the price of anonymity.
> One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place... ...So good on Cloudflare.
Just to be clear: Cloudflare is only changing the _provider_ of CAPTCHA's. They are not changing the _criteria_ for showing CAPTCHA's.
So users who have robust cookie blocking in place will continue to be penalized.
I would love to see the raw data on how many transactions have been abandoned because of ReCaptcha; if I had to solve a test to purchase my shopping, I'd go elsewhere (and there are places that are not as hostile out there).
I cannot understand the stupidity of putting your entire business in the hands of an advertisement company who gives no shits about you as a business or a person, apart from your data.
I can say for certain ReCaptcha has made me reconsider a purchase and is a major factor in my purchasing decision. If I can't use all my privacy tools (including noscript, and I only whitelist a few times to get the right scripts), then I don't care about what you're selling.
Hopefully in the near future ReCaptcha breaks altogether due to enhanced privacy protection.
> "Earlier this year, Google informed us that they were going to begin charging for reCAPTCHA. That is entirely within their right. Cloudflare, given our volume, no doubt imposed significant costs on the reCAPTCHA service, even for Google."
Even in the article they say... "Google provided reCAPTCHA for free in exchange for data from the service being used to train its visual identification systems." ... I thought this was one of those win/win things... Google gets something, websites get something... what's changed? Is Google not getting much out of reCAPTCHA now?
> Again, this is entirely rational for Google. If the value of the image classification training did not exceed those costs, it makes perfect sense for Google to ask for payment for the service they provide.
This might be exacerbated in the case of Cloudfare. Imagine a system where 99% of the visitors being challenged are human. The data gathered from such visitors is quiet, quality data. That fits the usecase of validating an anonymous poster on some random blog. Now consider the Cloudflare usecase. Visitors will only be challenged when Cloudflare already expects you're a bot. Most of the challenges are served to bots. The data is much lower quality, but their cost per challenge has remained the same.
It could just be that as this type of usecase became dominant, the balance of value tipped.
I guess this is very true. Our quite elaborate Cloudflare Firewall setup combining bot management scores with GeoIP and network information to decide on the action has solve rates below 0.5% on most rules.
The only case where we see up to 3% solved is on rules targeting networks which contain mostly free (as in beer) VPN providers (the new pest of the internet). Those networks sent a lot of malicious and automated traffic with the mixed in 3% of real users.
To put this into numbers of the past 24h:
~ 76 Million requests served
~ 1 Million of those were captchas
~ 0.5 Million were outright blocked
Captchas solved: 1233
Seeing that reCAPTCHA v3 doesn't use endless streams of images any more, I would guess that Google no longer benefits much from having users tag storefronts, traffic lights, buses or fire hydrants. Maybe their image recognition algorithm is past that stage.
It does as a fallback. But you’re missing the main point of v3, which is that it shifts the legal onus of blocking from Google to the integrating site. No longer can Google be sued for accessibility violations, if it’s the site that’s stopping the user from entering purely on a suggestion from Google.
> Seeing that reCAPTCHA v3 doesn't use endless streams of images any more
On the other hand, I've been effectively banned from several sites because I don't accept third-party requests to Google from non-Google sites as a result of this change.
Pure speculation, but at some point your dataset is large enough.
The original reCAPTCHA corrected errors in scanned books published decades/centuries ago. At some point, they're all fixed.
Similarly, more recent images have all been of traffic images. And they probably have way more than enough now -- at least of the type that can be done by reCAPTCHA.
So unless Google comes up with a new mass-categorization problem easy enough for literally everyone to do and simple and small enough to fit in a reCAPTCHA... then they charge.
I think using captchas for image recognition was one of the most ingenious strategies of the modern web. Don't think Google is making the correct move here.
Overall I would like to see these checks removed and Cloudflare is using them quite excessively.
> Google provided reCAPTCHA for free in exchange for data from the service being used to train its visual identification systems.
Has this been true lately? Every time I see it, it gives me the same images from a set of 3. 90% of the time it's classifying street lights, and it's the same street lights every time. About 7% of the time, it's pictures with cars in them, and again, it's the same pictures most times (but in a different order, I think). The remaining times it's fire hydrants or store fronts, often in a language I can't read, so I don't know if it's a store or not. (And again - mostly the same images each time.)
It's probably a question of size. Same as with Google Analytics. Google can afford to offer it free of charge for smaller websites but charges for larger ones. Cloudflare was probably one of the heaviest users with a very high percentage of bots (as they're good in pre-filtering).
my bet is that the bean counters have caught up with this product, and it'll be run into the ground with excessive pricing, because Google products have to make millions or otherwise they'll be killed. most notably, Reader.
These complaints about Google "moving too fast" used to really confuse me. I couldn't really spot a meaningful difference in mean survival b/w Google products, start-ups similar to individual Google products, and other businesses' behaviour.
But I've now attained zen-like clarity on the issue: the complaints are coming only, and always were coming mostly, from people whose idea of appropriate change over time is to still complain about Google Reader almost a decade after it happened.
this is being intentionally obtuse. Hire shutting down got 200 points 7 months ago: https://news.ycombinator.com/item?id=20815293, also see: Hangouts, Google+, Nest, Code Search, Site Search.....
it's not about "moving fast" at all. it's about google killing anything that doesn't make millions as opposed to just thousands (enough for basic maintenance). I never said anything about timeframe.
Well. That's probably fantastic news; using ReCAPTCHA (and thereby making users subject to Google's tender mercies) was honestly my main reason to dislike cloudflare from a user's perspective. ReCAPTCHA is utterly foul; it follows you everywhere it can, exists to undermine privacy, punishes non-Chrome users, and throws you in an infinite loop when it decides that you're not a human.
Specifically designed to allow you to authenticate once and then use that as a proof of work across multiple sites, without revealing your identity as being connected across those sites. Here's the math: https://blog.cloudflare.com/privacy-pass-the-math/
I'm not so hot on this stuff so this might be answered or clear on the site and I didn't get it - can the other sites tell who authenticated you? Can the authenticator add metadata?
I'm wondering about other use cases, using it to prove you've paid for something, or donated perhaps. Or passed a daily quiz/challenge. I feel like there's some fun ways of using this.
It's a start. reCAPTCHA is a notorious pain in the arse for anyone whose browser isn't Chrome and for anyone who doesn't keep cookies. I'm not sure if hCaptcha will be better, but it's hard to imagine it being any worse.
By now, I almost immediately close a page with a reCAPTCHA, because the stream of buses, traffic lights, and cycles never seems to end when you're using Firefox. And then it says "too many requests from this computer" and refuses to continue.
I'm amazed Mozilla hasn't sued Google for discriminating against their browser - I also use Firefox and suffer endlessly using privacy tools. I can prove there are no more busses and I'm 100% right, but I can predict 100% of the time it'll say "please try again".
The pattern seems to be 2/3 'right' guesses. on sites like eBay, the captcha is broke on firefox. I complete it, and it says "you need to resubmit this form again", and reloads the entire page.
That's the cost of privacy; broken pages and refused access because Google says "NO!".
And businesses are okay with Google denying them money. I wonder if they did a cost/ben analysis if they find it worthwhile.
Thanks to Google, I've actually saved quite a bit of money, they lost out hundreds recently when their automated systems decided to refuse my transaction. Their loss and my gain.
>I can prove there are no more busses and I'm 100% right, but I can predict 100% of the time it'll say "please try again".
I frequently run into the same issue of having correct answers rejected, and have read posts from many others who experience the same. At some point I started intentionally picking random squares for the first couple image sets. Interestingly, it doesn't seem to end up taking any more submissions overall than when I try to pick the right answers from the start.
Plus, polluting Google's free work data set ever so slightly gives me a small amount of pleasure.
I wonder why they don’t negotiate with Msft to use Bing or even DDG instead. Seems... incredibly odd... to put oneself in a position where a third party is directly antagonizing your users, reducing your user satisfaction and likely dramatically increasing churn, but you can’t do anything about it because that same party is your main source of funding.
(Disclaimer, I work at msft. Nowhere near this though).
The fact that they’re still on google even though google is screwing over their userbase? I don’t use Firefox because of how difficult it makes captcha. There are others like me.
If they are negotiating with other providers, they certainly aren’t doing a very good job of it.
If I recall correctly, the last time I checked Google's support for Mozilla was in the hundreds of millions per year. I would be shocked if DDG could afford even 10% of that, even as an investment they expected to recoup through additional advertising revenue.
> And businesses are okay with Google denying them money
Make sure they know. I write to sites and tell them they just lost a customer because Google doesn't give a shit. I've gotten replies from smaller outfits that had no idea what was going on.
I think their cost analysis would mark you as a bot that got stumped by the captcha and thus a bet benefits. (Sales to bots are worse than not selling, else they wouldn’t implement this at all)
I have managed to successfully solve the audio CAPTCHA before (even though the pictures are impossible to solve), although now they must have disabled it because it doesn't work.
On the other hand, their new HCAPTCHA is a notorious PITA for anyone, including those whose browser is Chrome and keep cookies.
Browsing an "I'm under attack"-mode website behind Cloudflare has been super annoying for me since last week. To the point that I usually close the page when I see a HCAPTCHA. Their visual challenge is harder to navigate than reCAPTCHA, and because this is their business model I suspect they have incentive to make it easier.
I don't. I assume a good CAPTCHA should reduce the size of the subset of users who must solve hard challenges as possible. The old reCAPTCHA usually requires only one click, no visual challenge for me. I don't like the fact this is because I use Chrome and logged in my Google account, but with HCAPTCHA I'm just not seeing any chance they make the experience better, no matter what browser I use, they just don't want to make my experience better, their business model is based on me suffering.
1. I do have an ad blocker installed, but it's not very aggressive.
2. All scripts are enabled. I already have trouble with some sites due to my fairly lax ad blocker.
3. I do not use a VPN (since it just transfers who is able to see my traffic from one party to another). Additionally, virtually every service provider penalizes VPN IPs to the point where it's probably not worth the hassle.
4. Not sure what you mean by "stores tracking Cookies".
---
> If all of those do not apply to you, I would feel discriminated against by Google
I do not agree with that (mostly because of point 3). The reality is that VPN traffic is significantly more "spammy"/bot-filled than non-VPN traffic. It's a perfectly rational and justifiable way to protect sites (albeit ReCAPTCHA is of dubious effectiveness).
I will not arguing against protecting ones website from bots, nor am I saying, that VPN traffic is not spammy in practice. Up until that point I am with you. However, making use of ReCaptcha is certainly not an ethical and therefore not a justifiable way of doing it.
Doing all of the stated things these days has become a minimum for protecting your privacy online. The current situation is a quite bad for privacy conscious people. Even if we only trust first party scripts and do not allow them being loaded from a subdomain, which actually has all the third party scripts again, we still face issues, for example fingerprinting.
I can only laud websites, which can be used completely without third party scripts or perhaps even without scripts at all, making sure it all works with REST, offering alternatives, when scripts are blocked.
It's good to see some "competition" in this area, even, if I do not trust cloudflare either. More competition means less Google monopoly. Hopefully in the long run it will lead to better solutions for casual users.
I'm quite sure you're correct. When stacked against however much Google was going to charge (I assume more than zero), Cloudflare's incentives seem pretty clear to me.
Yeah....In the article, it says that Google wanted to charge millions of dollars and that Cloudflare did not think hCaptcha trying to pay them was sustainable, so they agreed to pay hCaptcha a large amount, but only a fraction of what Google would charge.
reCaptcha is wildly sophisticated under the hood[1]. I use it on all three major browsers and find the number of challenges varies from 0 to 4: sometimes it says I'm verified without doing anything, other times I need to go through 4 screens.
I would love to see someone put some numbers behind this claim, because I think it is false.
EDIT: Are you downvoting because you don't like reCaptcha, or because you can't (or won't) set up an experiment to demonstrate this claim and prefer to just jump on the bandwagon?
I've experienced reCaptcha simply looping forever. After solving 5 or so screens, I give up and hope that reloading the page works. If not I usually switch to Chromium, which doesn't even get a single puzzle, just verified.
My heavily adblocked FF has a lot of trouble with recaptcha, while the Chrome instance that I only use for logged-in Google and LinkedIn doesn't. It seems like there are enough moving parts that it would be hard to figure out why our anecdotes are so different.
> Earlier this year, Google informed us that they were going to begin charging for reCAPTCHA
So it came down to cost.
> Over the years, the privacy and blocking concerns were enough to cause us to think about switching from reCAPTCHA. But, like most technology companies, it was difficult to prioritize removing something that was largely working instead of brand new features and functionality for our customers.
I like that they're upfront about this. In most companies / teams of this size, these issues are always swept under the carpet until something ugly forces you to clean up at a later point in time. It's just unavoidable.
Hey everyone. HCaptcha founder here. We are so happy to be on hackernews. I'm curious if anyone is having any problems? We are trying hard to respond carefully to customer requests but as you can guess we are very busy. Also we are hiring :)
Hey Alex, one suggestion, the HCaptcha challenge box is way to tall, sitting at 725px, it's larger than the chrome viewport on a 13" MBP, so I have to keep on scrolling up and down to solve the captcha.
You would not believe how much we think about these things. We appreciate the feedback and will continue to tune for every puzzle. Thank you so much for the feedback.
can you get 4chan to start using hcaptcha, I can't tell you how much I hate Google recaptcha. not to mention the brainiacs at 4 Chan have figured out how to solve Google's recaptcha easily.
A few days ago I encountered this when Cloudflare decided my IP address (which is behind an ISP-level NAT) was suspicious all of a sudden (which it hadn’t been doing, a pleasant change from when I was at this location three years ago when half the internet sprouted Cloudflare CAPTCHAs at me). It was awful to solve, worse than the substantial majority of reCAPTCHA checks I’ve encountered. Certainly nothing like the illustrations in the article.
I had the same experience. But this may just be an artefact of humanity now having been trained exceptionally well to identify traffic lights and busses, but being relative novices at identifying elephants.
And now I'm wondering if this may not be a spectacularly useful tool to raise standards of education world-wide. Imagine, say, the French government buying them and asking every person on the internet twice a day to match some vocabulary to images: Identify "le baguette"! Lingua Franca, le sequel.
Or a maps puzzle: "Please identify Equatorial Guinea, Papua New Guinea, and Guinea-Bissau".
I tried a hcaptcha and it was way harder to solve than the usual recaptcha. However, It was significantly easier than the recaptchas you get when using tor.
I just tried it on a website that uses Cloudflare and that always asks me to solve a captcha. (I guess this website does this if the user has a foreign IP address.) In the past I managed to get the non-script Recaptcha. But I don't see a non-script Hcaptcha. I'm a little afraid of possible browser fingerprinting scripts. If there was an unwaivable, enforced right to privacy I wouldn't be afraid.
Also, I don't want to solve any script captchas anymore because of a traumatic experience with script Recaptcha. I had a portable Chromium with login cookies for a few websites. I didn't use that Chromium for other websites than these few. Suddenly, one service almost always demanded a new login after just 1 day. On each login I had to solve a script Recaptcha. I didn't find a way to get non-script Recaptcha. According to the service evil spambots had attacked it. Once, Recaptcha let me solve captchas for minutes, just to eventually tell me I was a bot. I had an IP of a large internet provider. I deleted cookies, got a VPN IP, tried it again, worked on the captchas in the exact same way as before and managed to log in to my account. A website operator wrote in a forum thread that Recaptcha was the only solution to the bot problem. One user suggested "email login as an optional alternative". This was not implemented, because apparently Recaptcha was really specifically the only solution. I then switched to another service, which cost me a few hours of work. This traumatic experience has made me completely unwilling to solve any script captcha.
A little off-topic, but the article mentions they support Privacy Pass. I remember seeing the announcement a little ways back when they first released it but just kind of forgot about it. Is anyone using the browser extensions? Has it reduced the amount of captchas you end up seeing, or made your browsing experience better in any way?
According to the article Cloudfront is paying, but is paying "a fraction of what reCAPTCHA would have [cost]". Recaptcha is $1/1000 challenges, so apparently hcaptcha is some small fraction of that.
Cloudfront might get a discount for running some of the infrastructure on their own servers, on the other hand that might also be an integration hassle that actually costs them money.
This seems unwise, because many captcha farms charge less than this. A quick Google search shows one service offering $0.50/1000 challenges. If it's 2x cheaper for an attacker to solve a captcha than it is for a provider to display it, it sounds like the attackers win.
True, the attacker is much less likely to have anywhere near the funds of the target, and they don't want to hurt them.
Regardless of the actual price multiple, it costing anywhere near the price to serve as the price to solve just seems to defeat the point. Really, it costing any money per captcha served just punishes sites that happen to face a higher volume of bots, even if they're a small site. It's just going to push the company to switch to a different captcha service, which may be even cheaper for attackers to solve.
I think you're confusing intent with implementation.
You're right that the implementation excludes non-malicious bots and fails to solve for malicious humans, but that just makes it an imperfect implementation of the intent: which is to differentiate malicious & good.
"We evaluated a number of CAPTCHA vendors as well as building a system ourselves."
and
"We worked with hCAPTCHA in two ways. First, we are in the process of leveraging our Workers platform to bear much of the technical load of the CAPTCHAs and, in doing so, reduce their costs. And, second, we proposed that rather than them paying us we pay them. This ensured they had the resources to scale their service to meet our needs. While that has imposed some additional costs, those costs were a fraction of what reCAPTCHA would have. And, in exchange, we have a much more flexible CAPTCHA platform and a much more responsive team."
So Cloudflare are basically cloud hosting hCAPTCHA's services. I wonder why Cloudflare didn't just buy them, as it seems like it would be a win-win with getting an excellent CAPTCHA service, and not have to build it themselves?
CF likes the CAPTCHA part of CAPTCHAS, but any vendor is probably far more invested in the "generating ML training data" scheme.
CF probably has zero interest in that part of the product: It doesn't fit with their existing products nor customers, and it's just too small relative to their other business to devote much attention to it.
At the same time, the business opportunity is probably too large for hCAPTCHA's founders to just forget about it, or for CF to compensate them on the hot-new-technology assumption when they're only looking for peace-of-mind-utility tech.
IMHO CPATCHA is a lazy way to protect your service as you shift the burden to your users.
Maybe if you are big and essential for some users, you can afford that. But if not, be aware that users will turn their back on you if you add obstacles between them and your service.
Edit: meant to say “be aware that some users will turn their back to you”
I run a small forum, and it was getting flooded with fake and spam accounts, the moderators were struggling to keep up and the users were finding it annoying. So I put a captcha on the registration page. The problem went to zero, new users still showed up, and more people were happier than before.
But if not be aware that users will turn their back on you if you add obstacles between them and your service.
You have to balance that against how many users you'd lose if the site was down/vandalized/compromised by an attacker if the captcha protection wasn't there to keep it out.
It's often worthwhile moving the captcha away from the initial login or signup form and only putting it on the second or third attempt to login, or on features that put significant load on the server.
> It's often worthwhile moving the captcha away from the initial login or signup form and only putting it on the second or third attempt to login
Though if your service is a lucrative target for {uname,pass} combolist spam, you'll see that each attempt comes from its own IP address and only makes that one request. It's pretty sobering.
Like most kinds of gated security, many solutions are borne out of inspecting the payload instead of who's sending it.
Captchas prevent bots from submitting spam, but they don't prevent humans from submitting spam. In 99% of cases, your problem is the spam, not who is submitting it. The non-lazy solution is to look at the content itself and directly determine whether it's spam, instead of relying on a related heuristic (e.g. who submitted it) to make an informed guess.
This isn't an alternative solution, just one you could do alongside making it difficult for drive-by bot spam.
For example, let's look at an actual service for identifying spam payloads: Akismet. It still lets a lot of spam through, especially in non-English languages.
This is a solution that, if done "perfectly", should be able to catch 100% of spam submissions. This is in contrast to things like captchas (because they "test" something other than the end-goal [no spam] to guess at whether something is spam or not, while ignoring spam from humans [or humans filling out captchas on behalf of bots], and cause problems for both humans and benign bots).
Obviously, it's an extremely hard problem that is hard to do 100% correctly. But it's a viable non-lazy solution (that still needs a lot more work than the current state-of-the-art implementations) compared to the lazy solution of just putting captchas on the page.
The ideal solution would get rid of spam without inconveniencing users who aren't submitting spam, I'd think, which means captchas aren't it.
Honest question - if you set it up so the user gets an email with a link they have to click before their message is actually sent to your queue, would that help?
I'm thinking it would probably reduce the number of users who successfully contacted you legitimately, but CAPTCHAs also do that. Do spammers actually have the email accounts they claim to and respond to confirmation emails?
It would definitely help, I highly doubt spammers would use that sort of mechanism.
The solution gets around potential vendor lock-in and privacy issues with a service like Google's, but it still fundamentally shifts the problem from the service to the user (the original commentor's gripe).
But this is exactly the point I am trying to make. That's the service provider's problem and not the user's. CAPTCHA shifts the problem to the user.
CAPTCHA is a 00's idea, when we had the multiple page registrations(with errors showing only after you submit the page), the insane password requirements, etc.. It doesn't belong to modern stack in my opinion.
"What is the non-lazy solution?" That's how disruption is born.
Almost. It gives a botness score to the server, and it's up to the website to decide what to do with that score. They can pick a threshold to approve, reject, or apply stricter verification to.
Yes it's trade off as usual.
The main benefit I see is on networks where you've a mix of good and bad traffic and you would still like to offer the service to the few good users.
I see this a lot on networks hosting a lot of free VPN providers. The other option we choose before was outright blocking. That is even more harmful for the few good users.
Apart from the surveillance aspect, one thing that bothered the hell out of me with Cloudflare using ReCAPTCHA was that it yielded a much larger part of the web than necessary effectively blocked in China, since the CAPTCHAs would get triggered, and not load, from Chinese IPs.
I had a customer where we had to migrate away from Cloudflare for this reason - this was about 5 years ago and the issue has been there to this day. Glad to hear they've finally done something about it. Even if it took Google starting to charge money for ReCAPCHA to trigger it.
Has anyone else seen reCAPTCHA getting way more difficult of late? It often takes me a full minute to find all of the tiny traffic lights hidden away in a set of low-quality images.
> We also had issues in some regions, such as China, where Google's services are intermittently blocked. China alone accounts for 25 percent of all Internet users. Given that some subset of those could not access Cloudflare's customers if they triggered a CAPTCHA was always concerning to us.
They are explicitly saying that China's blackmailing of Google is working so well it even affects decisions on using Google products outside of China.
I'm not a Google fan and think this move is a great improvement for the web and user privacy, but that this was explicitly motivated by China's blackmailing tactics is terrifying.
And we can from this post even make another case that also doesn't paint a nice picture: Cloudflare does not care enough about 25% of internet users to move away from reCAPTCHA - until it affects their bottom line in a visible and immediate way.
There are plenty of services that will happily accept a screenshot from a developer, send it out to live humans who solve it in real time, and then return the answers to the developer.
I'm not going to link to them, but you can find them yourself by googling "buy recaptcha solver". The prices for the top two results are $0.50 and $1.39 per 1000 solves (respectively, $0.0005 and $0.00139 per solve).
At that price point, it's feasible for the truly determined to just use those solvers to bypass ReCAPTCHA (or similar services).
Are there chrome extensions that I can use these with? I'd be willing to pay those rates to never have to solve a captcha again. I'm fine leaving the tab open for a few minutes while it's solved even.
hCAPTCHA looks interesting, although it seems they use Blockchain for no real reason compared to just storing the payments as rows (i.e what they gain from being chained on top of another)
The point of a blockchain is that to edit an earlier record, you would need to edit every record that comes after (due to storing a hash of the previous block in the current block). However, it doesn’t make sense when one entity controls the entire system because if a hacker (or even an insider) can change one record, they could change all of them. Hence why a good blockchain would be distributed. Then, if one node edits the history, the other nodes will see the anomaly and ignore that node.
This is also why Git’s history is easy to edit when it’s only on your machine. But once you push to GitHub and others clone your repo, it becomes a lot harder to edit history. Yes, Git isn’t a blockchain, but it does use the idea of hashing the previous “block” (commit) and storing it in the current “block.”
However you can do a local blockchain (or hash chain, or whatever you want to call it) and distribute just the hashes. If you have a local git repo and regularly tell me your commit IDs I can testify that the code existed at that point in time, and can later verify it wasn't changed if you choose to expose the full commit to me. And because it's a chain, you only need to communicate one commit ID for every external timestamp you care about, not for every commit you care about.
Yes if do not you want to distribute your data with random people over the internet, you need a Merkle tree. Not a stupid blockchain with all the downsides a blockchain have.
If you strip out the proof-of-work algorithm you're basically left with a chain of Merkle trees, and the payloads hashed by the Merkle trees. Calling it a blockchain is just a way to make it sound more familiar to potential investors.
It's not worth a rich person's time to solve captchas, while it is for a poor person. This has lead to captcha solving services, extensions plugins, etc, all which have high latency delay, not over a fast documented API. It would be 100 times easier if cloudfare/google let's you directly buy credits, at the mid-point price between current bid-ask spread, of say 50 cents per 1000 captchas, which would probably last you a few months to a year.
I've ran into hCaptcha a couple times recently and found it vague and I had to try to guess what they meant. Both times it asked me to identify the truck. Well, what do you mean by "truck?" are you counting a semi as a truck? I ended up having to do it twice because I don't consider a semi a "truck" but they did.
Interesting, I know some people consider a Truck a semi but your pick up truck isn't really a truck according to others. So confusing with all the different definitions.
This is fantastic news for privacy on the web. Thank you Cloudflare!
I’ve been seeing hcaptcha in more and more places recently. It’s a bit rough around the edges still, but it works well and feels far less hostile than recaptcha.
The funny thing is that Google doesn't even use recaptcha and instead use some awkward hard to read piece of shit. After 4-5 guesses, and they are guesses you might proceed.
It's funny that we need to ensure humans are the ones performing certain actions like making a purchase or accessing a service, but we let machines make decisions over very important matters in our lives (credit/financial decisions).
It's intriguing they said Google will charge for reCaptcha, any information on that? I can't imagine all the small business owners will have to start paying, but perhaps if they did they'd just remove it altogether (a net win!).
We've seen ReCAPTCHA pop all over ecommerce, all over benign websites with little to no need to challenge use almost completely because of the increase in privacy-aware users.
ReCAPTCHA essentially flies in the face of the recent blocking features rolling into Safari and Firefox and more privacy-aware users...growing by the day.
In many ways it's a genius structure from Google. 1. Convince people to use your privacy challenge. 2. Serve it when you don't see Google tracking cookies. 3. Offer a way around that with the least privacy-aware browser available (Chrome use is growing steadily month over month.
So good on Cloudflare.