Hacker News new | past | comments | ask | show | jobs | submit login
Naive Bayes classifier in 50 lines (umbc.edu)
64 points by weinzierl on Nov 27, 2011 | hide | past | favorite | 10 comments



Good post, but it's a sad reflection of code's failure as a medium of expression that we need 50 lines of Python to express one line of math.


If you look at the code, it's more like "write an ARFF file parser in 40 lines" and "do a NB classifier with add-one smoothing in 10 lines".

If you use a better-adapted input format and code things more concisely, you'd probably end up with two functions of 3-4 lines each; conversely, if you wanted to do things properly, you'd separate out ARFF file parsing and the Naive Bayes functionality.

All in all, the blog post wouldn't make me want to recommend their group for prospective (undergrad or graduate) students.


Seriously. I think you could golf NB into 2 lines pretty sensibly once you've got the data in. It's really just compute two histograms, multiply, and maximize.


That's not one line of math. That's one line of math and ten years of textbooks. Math just has a bigger standard library.


And no fear of namespace collisions, or of introducing additional symbols.

Want to match Math in size when implementing algorithms? Use APL. Want to avoid adding additional symbols all the time? Use J/K/etc (APL descendants). Want to avoid namespace collisions? Welcome to Java.Sun.Com.Math.Oh.For.Effing.Sake.FactoryInterfaceBuilder


The actual translation of the formula to Python is about 4 lines. Starting on L30 - L34 of [1]. The rest is just IO plumbing.

[1] https://gist.github.com/731413/7ad1b4c04bc2d6b5033c5811efcb4... .


This is an interesting point - I suppose it shows how sophisticated the language of mathematics is.

I think its not perhaps the 50 lines that matter though (most of this is just effectively defining what the mathematical symbols and grammar mean), but the one line, which can tell you how everything relates together very effectively..... this is what the python version misses to me...


tell that to sir alan kay

ps: I should add some data >>> http://tinlizzie.org/~awarth/


http://code.google.com/p/aima-python/source/browse/trunk/lea... is shorter (NaiveBayesLearner around line 200) though that assumes some infrastructure.


Upcoming:

- Raycaster in 1000 lines of Lisp

- Database management system in 20000 lines of C

- Python web framework in 2000 lines of Python

...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: