K-Nearest Neighbour is a simple Machine Learning method that uses the Supervised Learning technique. It is a non-parametric supervised learning approach invented by Evelyn Fix and Joseph Hodges in 1951 and later expanded by Thomas Cover.

It is extensively applicable in real-world scenarios since it is non-parametric, which means it makes no underlying assumptions about data distribution (as opposed to other algorithms, such as GMM, which assume a Gaussian distribution of the input data). 

Lazy learner algorithm

The K-Nearest Neighbours algorithm remembers everything and then uses that information to label a new data point. The K- NN algorithm can be quickly applied to new data to classify it into a suitable category. It can be used for both Regression and Classification. 

K-Nearest Neighbours (KNN) is sometimes called a "lazy learner" algorithm since it does not use the training set immediately but instead stores the dataset and uses it for classification. During its training phase, the KNN algorithm saves the given dataset. When it receives new data, it uses the stored information to determine which category the new data most closely belongs to.

Structure of KNN

K-nearest neighbours (KNN) use 'feature similarity' to predict new data points' values. The following steps will help us understand how it works:

Step 1: In the first stage of KNN, we must load both the training and test data.

Step 2: Next, we must select the value of K, i.e. the closest data points. K can be any positive integer.

Step 3: Perform the following for each point in the test data:

  • Determine the distance between test data and each row of training data using any of the following methods: Euclidean, Manhattan, or Hamming distance. The Euclidean method is the most often used method for calculating distance.
  • Sort them in ascending order depending on the distance value.
  • It will then select the top K rows from the sorted array.
  • It will now assign a class to the test point based on the most common type of these rows.

Step 4: Finish

KNN algorithm Python code

Assume 0 and 1 as the two classifiers (groups). 
# Python program to find groups of unknown
# Points using K nearest neighbour algorithm.
import math
def classifyAPoint(points,p,k=3):
	'''
	This function finds the classification of p using
	the k nearest neighbour algorithm. It assumes only two
	groups and returns 0 if p belongs to group 0, else
	1 (belongs to group 1).
	Parameters -
		points: Dictionary of training points having two keys - 0 and 1
				Each key has a list of training data points belonging to that
		p : A tuple, test data point of the form (x,y)
		k: number of nearest neighbour to consider, default is 3
	'''
	distance=[]
	for the group in points:
		for feature in points[group]:
			#calculate the Euclidean distance of p from training points
			euclidean_distance = math.sqrt((feature[0]-p[0])**2 +(feature[1]-p[1])**2)
			# Add a tuple of the form (distance, group) in the distance list
			distance.append((euclidean_distance,group))
	# sort the distance list in ascending order
	# and select the first k distances
	distance = sorted(distance)[:k]
	freq1 = 0 #frequency of group 0
	freq2 = 0 #frequency og group 1
	for d in distance:
		if d[1] == 0:
			freq1 += 1
		else if d[1] == 1:
			freq2 += 1
	return 0 if freq1>freq2 else 1
# driver function
def main():
	# Dictionary of training points having two keys - 0 and 1
	# key 0 has points belonging to class 0
	# key one has points belonging to class 1
	points = {0:[(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)],
			1:[(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)]}
	# testing point p(x,y)
	p = (2.5,7)
	# Number of neighbours
	k = 3
	print("The value classified to unknown point is: {}".\
		format(classifyAPoint(points,p,k)))
if __name__ == '__main__':
	main()
	
# This code is contributed by Atul Kumar (www.fb.com/atul.kr.007)

code source: Geeksforgeeks

KNN Algorithm Benefits

  • It is easy to apply in practice.
  • It can withstand the ambiguity of the training data.
  • Larger amounts of data for training can improve performance.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE