Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Recent Post

Codecademy Code Foundations

Search This Blog

K-nearest neighbour in Machine Learning

K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. 

It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.

Basic k-nearest neighbor classification

Training method :
 - Save the training examples

At prediction time :
- Find the k training examples (X1,y1),.....(xk,yk) that are closest to the test example x
- Classification : Predict the most frequent class among those y1's. 
- Regression : Predict the average of among the yi's.

Improvements :

- Weighting examples from the neighborhood
- Measuring "closeness"
- Finding "close" examples in a large training set quickly


Average of k points more reliable when :
                                            - nose in attributes
                                            - noise in class labels
              - classes may be partially overlap 





Equal weight to all attributes :

- Only of the scale of the attributes are similar and differences
- Scale attribute to equal range and equal various
- Classes are spherical



 In above image there are three classes red, green, black.

Small K :- Capture fine structure of problem space better, Its may be necessary for small training set

Large K :- The less sensitive to noise (particularly class noise). we get better probability estimate for discrete classes.Larger training training set allows use of larger k.





Weighted Euclidean Distance


large weights             =>    attribute is more important
small weights            =>    attribute is less important
zero weights              =>    attribute does't matter

- Weights allow KNN to be effective with axis-parallel elliptical classes
- Where do weights come from ? 

Distance-Weight  KNN

tradeoff between small and large k can be difficult
 - use large k, but more emphasis on nearer neighbor ?



Locally Weighted  Averaging

Let k = number of training points
Let weight fall-off rapidly with distance 


KernelWidth control size of neighborhood that has large effect on value (analogous to k)

No comments:

Post a Comment

Popular Posts