Naive Bayes classifier

 Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and reliable algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.

Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features. For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location. Even if these features are interdependent, these features are still considered independently. This assumption simplifies computation, and that's why it is considered as naive. This assumption is called class conditional independence.



·       P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior probability of h.

·       P(D): the probability of the data (regardless of the hypothesis). This is known as the prior probability.

·       P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.

·       P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior probability.

Let’s understand the concept of the Naive Bayes Theorem through an example. We are taking a dataset of employees in a company, our aim is to create a model to find whether a person is going to the office by driving or walking using salary and age of the person.



 

In the above, we can see 30 data points in which red points belong to those who are walking and green belongs to those who are driving. Now let’s add a new data point into it. Our aim is to find the category that the new point belongs to.

Note that we are taken age on the X-axis and Salary on the Y-axis. We are using the Naive Bayes algorithm to find the category of the new data point. For this, we have to find the posterior probability of walking and driving for this data point. After comparing, the point belongs to the category having a higher probability.

 

The posterior probability of walking for the new data point is :

also for the driving is :



 

Naive Bayes algorithm



Step 1: We have to find all the probabilities required for the Bayes theorem for the calculation of posterior probability

P(Walks) is simply the probability of those who walk among all

 

 

In order to find the marginal likelihood, P(X), we have to consider a circle around the new data point of any radii including some red and green points.




 




 

 

 

P(X|Walks) can be found by :



 

Now we can find the posterior probability using the Bayes theorem,

 





Step 2: Similarly we can find the posterior probability of Driving, and it is 0.25

Step 3: Compare both posterior probabilities. When comparing the posterior probability, we can find that P(walks|X) has greater values and the new point belongs to the walking category.


Post a Comment

0 Comments