Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the value of this is the extremely low false positive rate so it can act as a larger sieve when there is a large amount of inputs to test - What other Binocular style detectors have you experimented against where you're seeing a "ton of false positives"?


I use https://originality.ai/ as the benchmark. I've tested all commercially available services, and Originality (at the time; its been a few months) provided the lowest false-positive rate. As a testing sample, I've built a database of articles written by various text generators and compare them against articles that I scrapped from web from before 2017 (basically any text before LLMs saw daylight).

I am sure that these algorithms have evolved, but given my past experiments, I sincerely doubt that we are at a point that (a) cannot be easily bypassed if you are targeting them, (b) do not create a lot of false-positives.

As stated in another comment, I personally "gave up" on trying to bypass AI detection [it often negatively impacts output quality], at least for my use case, and focus on creating highest-possible value content.

I know that services like Surfer SEO are continuing to actively invest in bypassing all detectors. But... as a human, I do not enjoy their content and that what matters the most.


Just for fun, I just tested a few recently generated articles with https://huggingface.co/spaces/tomg-group-umd/Binoculars (someone linked it in this tread) and it ranked them as "Human-Generated" (which I assume means human written). And... I am not even trying to evade AI detection in my generated content. I was wholeheartedly expecting to fail. Meanwhile, Originality detects AI generated content with 85% confidence, which is ... fair enough.


If I'm reading this correctly, it's not making any particular claim with respect to text labeled human generated. What it's saying is that if it claims the text is machine generated, it's highly likely that it actually is.


The article you're commenting on actually states in its abstract:

> Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: