Understanding Naive Bayes classifier

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Naive Bayes classifiers are simple "probabilistic classifiers" in statistics that apply Bayes' theorem with strong (naive) independence assumptions between features.

In 2004, researchers examined the Bayesian classification problem and found reasonable theoretical explanations for the seemingly unbelievable success of naive Bayes classifiers. Bayes classification is inferior to other methods, such as boosted trees and random forests, according to a comprehensive evaluation made in 2006.

What is Naive Bayes?

The term "naive" refers to the belief that the existence of one attribute is unconnected to the occurrence of others. For instance, an apple can be recognized as such because it is red, has a round form, and is tasty. Therefore, it is possible to tell that it is an apple based on its characteristics without any of them being dependent on any other. Its name, Bayes, comes from the fact that it is based on Bayes' Theorem.

Scalable Naive Bayes classifiers

Scalable Naive Bayes classifiers need linear parameters in learning problem variables (features/predictors). For maximum-likelihood Training, assessing a closed-form expression requires linear time, unlike costly iterative approximation for other classifiers.

Naive Bayes models are sometimes called simple Bayes and independent Bayes in statistics. All these titles allude to Bayes' theorem in the classifier's decision process. However, naïve Bayes is not Bayesian.

Classifier models

The Naive Bayes method is a straightforward approach to developing classifier models that ascribe class labels to instances of a problem expressed as vectors of feature values. All naive Bayes classifiers assume a feature's value is independent of its importance given the class variable, but this is not true.

One can operate with the naive Bayes model without embracing Bayesian probability or utilizing any Bayesian methods, as parameter estimation for naive Bayes models often employs the maximum likelihood approach. Naïve Bayes classifiers have performed well in many complex real-world situations despite their naïve design and seemingly simple assumptions.

Algorithm

Naive Bayes has the benefit of needing minimal training data to estimate the relevant classification parameters.

The first step in using data effectively in code is preprocessing or preparing the data. Here is the corresponding code:
import numpy as nm  
import matplotlib.pyplot as mtp  
import pandas as pd    
# Importing the dataset  
dataset = pd.read_csv('user_data.csv')  
x = dataset.iloc[:, [2, 3]].values  
y = dataset.iloc[:, 4].values  
from sklearn.model_selection import train_test_split  
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)  
from sklearn. preprocessing import StandardScaler  
sc = StandardScaler()  
x_train = sc.fit_transform(x_train)  
x_test = sc.transform(x_test)

Using the Training Data to Fine-Tune Naive Bayes: Now that the data has been cleaned and prepared, the Naive Bayes model may be fitted to the Training data.

from sklearn.naive_bayes import GaussianNB  
classifier = GaussianNB()  
classifier.fit(x_train, y_train)

Code source: javatpoint

Conclusion

Despite the generally wrong far-reaching independence assumptions, the naïve Bayes classifier has several qualities that make it remarkably useful in practice. In particular, since the distributions of the class conditional features are no longer coupled, their dimensions can be approximated. The curse of dimensionality reduces the demand for data sets that increase exponentially with feature count. In many situations, naive Bayes may not be necessary to provide a good estimate for the right class probability.

Furthermore, if the naive Bayes classifier predicts that the proper class is more probable than any other class, it will correctly apply the MAP decision rule to the data. It holds regardless of how correct or erroneous the probability estimate is. This way, the overall classifier can be strong enough to overlook the obvious flaws in the naive probability model it was built on.

Sources of Article

Image source: Unsplash