Implementing a Gaussian Naive Bayes Classifier in Python

pedrosorio · on Feb 21, 2018

It seems odd that the key feature of Naive Bayes (conditional independence of individual features given the class) is only mentioned in passing (and never explicitly that they are conditionally independent).

This feature is then used when transforming probabilities to log-probabilities in one go without any mention which would make it particularly confusing for beginners.

I would recommend decomposing p(x|C) before applying the log transform for clarity.

amallia · on Feb 21, 2018

Thanks for pointing out! This is a blog post for people that know already a bit about the theory and are interested in a possible implementation. But I see your point and will add it.

nerfhammer · on Feb 21, 2018

What does it mean to use the probability density function of the gaussian distribution as a "probability"?

the probability of any x value in a continuous range is infinitely small. Is it really using the probability of a value "as extreme or more" as x?

pedrosorio · on Feb 21, 2018

It is not a probability.

https://en.wikipedia.org/wiki/Joint_probability_distribution...

nerfhammer · on Feb 21, 2018

P(Y=y|X=x)fX(x) isn't that like p-value?

pedrosorio · on Feb 21, 2018

It's a probability (conditional probability of the discrete rv Y given the continuous X=x) multiplied by the density of the continuous rv at X=x. It is the value of the joint density at X=x, Y=y.

p-values are unrelated to this but are computed by looking at the cumulative distribution function (the integral of the density).

amallia · on Feb 21, 2018

If you have a bag of [1.0 3.1 5.2 7.8 7.8 7.9 8.1 8.2 9.9 10.1] what is the probability of picking 8.0? It is definitely higher then 1.1, isn't it? I hope this clarifies the pdf usage.

nerfhammer · on Feb 21, 2018

The probability is zero.

pdf = slope of the cdf. The value of the pdf at a given point is not a probability, it's the instantaneous rate of change of a probability. You need to integrate the pdf over a range to get a probability.

You could take the area under the pdf (i.e., integrate) for a window around a given x or use the area under the tail of the pdf past x (i.e., p-value).

nerdponx · on Feb 21, 2018

You have an interval of real numbers [0, 1]. What's the probability of picking 0.3639401?

orlp · on Feb 22, 2018

If we're using a uniform random number generator in [0, 1] with 32-bit IEEE754 numbers, the closest to 0.3639401 is

    2^-2 * (1 + 3823195*2^-23) = 0.3639400899410247802734375

The numbers directly before and after it are:

    2^-2 * (1 + 3823194*2^-23) = 0.363940060138702392578125
    2^-2 * (1 + 3823196*2^-23) = 0.36394011974334716796875

Giving a range of size 2.980232238769531250 * 10^-8 in which all numbers compare equal to 0.3639401 in IEEE754 32-bit floating point. And since we're looking at a domain of [0, 1], that's also immediately our probability.

(I'm fully aware this isn't what you were asking, but I found it fun either way)

bstamour · on Feb 21, 2018

It doesn't make sense, in my opinion (though I could be ignorant). For classification, wouldn't it be better to stick to discrete probability mass functions?

ilyashabeeb · on Feb 21, 2018

The attributes are real-valued. You cannot choose to use discrete probability functions. You have to make use of probability density functions.

myegorov · on Feb 22, 2018

Might be worthwhile to add conditional risk to it, to generalize it to the minimum risk classifier. That way you'd also distinguish it from, say the scikit-learn implementation.

4tomX · on Feb 21, 2018

Great topic!

amallia · on Feb 21, 2018

Thank you dude!

vinni2 · on Feb 21, 2018

What’s so special about this post? There are tons of ML tutorials online! Why was it upvoted to front page?

amallia · on Feb 21, 2018

I honestly couldn't find many implementations of Naive Bayes out there. The famous ones are over engineered to use it as a learning too. I think people appreciate the fact that an article like this for its step-by-step approach.

aaronbrethorst · on Feb 22, 2018

I went looking for something just like this a few weeks back. The examples I found were all either wildly over-mathematical or terribly unclear. This seems to strike a good balance. Thanks for the writeup!

eggie5 · on Feb 21, 2018

here's my attempt a while back: http://www.eggie5.com/66-naive-bayes-classifier-implementati...

nerdponx · on Feb 21, 2018

It's hot right now, but a lot of people don't actually know anything about it. A lot of ML tutorials are either dumbed-down or already pretty technical. People appreciate clear demonstrations of fundamentals.