Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
One unit test should have prevented Google from categorizing the entire internet as malware (jawspeak.com)
15 points by rishi on Feb 2, 2009 | hide | past | favorite | 12 comments


That's easy to say after the fact. If Google didn't have a comprehensive testing infrastructure, they'd have already made many worse mistakes than this.


Agreed. I think this kind of attitude stems from thinking that unit tests are for TDD, instead of for regression testing.

This post seems more reasonable if re-stated as, "Here's the unit test that will keep that from ever happening again".


Yeah, the problem seems to be incorrect validation rules in the code (assuming this "/" input was the real issue).

Both "TDD would catch this during coding" and "tests would catch this in regression testing" only hold if there is actually a specification of functionality with the correct input validation rules in the first place.

It seems more likely to me this is the more mundane situation that no one was paranoid enough about inputs in the first place.


This seems unlikely given that it was a data problem, not a coding problem, which caused the issue. Do you routinely run unit tests after hitting the Upload Content button on the website? I'm guessing that is a no. (Now the act of writing a unit test for "/" as input might have caused a programmer to say "Hey, the specification says that if we upload "/" in this file, then the entire Internet gets turned off. The specification is insane. Can we fix it?" But ordinarily you probably wouldn't run unit tests after upload just like you don't run a test suite after hitting Publish on your blog.)


Google spends far more time manipulating data than code. You CAN unit test data.

This one slipped by. I guarantee you that a new test has been written and this one won't happen again. Save a different problem for a different day.


The article doesn't really say what the test case should be.

I would probably write:

    (is (not (malicious-site-p "http://www.google.com/")))
Since hopefully Google never blocks itself.


I think it does casually suggest that there should have been a test for the '/' literal, which seems simplistic.

In addition to testing for google.com, a more robust test might be to assert that >90% (or some such threshhold) of results for a basket of random searches are not malware, as a sanity check.

Of course it's possible that a large proportion of random search results suddenly become malicious. But in that situation I'd want some alert to trip anyhow and get someone figure out what was happening to the web!


They did it once.

Google Admits To Cloaking; Bans Itself (March 2005)

http://blog.searchenginewatch.com/blog/050309-092708


Google pages are occasionally used for malicious purposes. Presumably google would shut those down however, rather than marking as malicious.


I don't think Google codes in CL ;)


Their loss. :)

I know they like Python, Java, and C++; but I don't really know any of those languages.


It wouldn't be surprising if internal domains are treated differently in some ways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: