# Lecture 12

## Machine Learning

• Definition
• Observe set of examples: training data
• Infer something about process that generated that data
• Use inference to make predictions about previously unseen data: test data
• Procedures
• Representation of the features
• separate people with features(man/woman, educated/not, etc.)
• Distance metric for feature vectors
• make feature vectors can be calculated in a same range.
• Objective function and constraints
• Optimization method for learning the model
• Evaluation method

### Supervised Learning

• Goal: find a model that predicts a value for a previously unseen feature vector
• Regression models predict a real
• As with linear regression
• Classification models predict a label (chosen from a finite set of labels)

### Unsupervised Learning

• Goal: uncover some latent structure in the set of feature vectors
• Clustering the most common technique
• Define some metric that captures how similar one feature vector is to another
• Group examples based on this metric

### Difference between Supervised and Unsupervised

• with label, we can classify the data to two clusters by wight or height, or four clusters by wight and height, which is Supervised Learning
• without label, to figure out how to clustering the data, is Unsupervised Learning.

### Choose Feature Vectors

• Why should careful?
• Irrelevant features can greatly slow the learning process.
• How?
• signal-to-noise ratio (SNR)
• Think of it as the ratio of useful input to irrelevant input.
• The purpose of feature extraction is to separate those features in the available data that contribute to the signal from those that are merely noise.

### Distance Between Vectors

#### Minkowski Metric

• $dist(X1, X2, p)=(\displaystyle\sum_{k-1}^{len}abs(X1_{k}-X2_{k})^p)^{1/p}$

• p = 1: Manhattan Distance

• P = 2: Euclidean Distance

• For example:

• To compare the distance between star and circle and the distance between cross and circle
• Use Manhattan Distance, they should be 3 and 4
• Use Euclidean Distance, they should be 3 and 2.8 = $\sqrt{2^2+2^2}$
##### Using Distance Matrix for Classification
• Procedures

• Simplest approach is probably nearest neighbor
• Remember training data
• When predicting the label of a new example
• Find the nearest example in the training data
• Predict the label associated with that example
• To predict the color of X

• The closest one is pink, so X should be pink
• K-nearest Neighbors

• Find K nearest neighbors, and choose the label associated with the majority of those neighbors.
• Usually, we use odd number. This sample, we use k = 3

• Learning fast, no explicit training
• No theory required
• Easy to explain method and results
• Memory intensive and predictions can take a long time
• Are better algorithms than brute force
• No model to shed light on process that generated data
• For Example

• To predict whether zebra, python and alligator are reptile or not.
• Calculate the distances, we got:
• The closest three animals to alligator are boa constrictor, chicken and dark frog, and two of them are not reptile, so alligator is not reptile.
• But we know alligator is reptile. So what’s wrong?
• We notice, all of the features are 0 or 1, except number of legs, which gets disproportionate weight.
• So, Instead of number of legs, we say “has legs.” And then this becomes a one.
• * The closest three animals to alligator are boa constrictor, chicken and cobra, and two of them are reptile, so alligator is reptile.
• A More General Approach: Scaling

• Z-scaling
• Each feature has a mean of 0 & a standard deviation of 1
• Interpolation
• Map minimum value to 0, maximum value to 1, and linearly interpolate

randomly chose k examples as initial centroids
while true:
create k clusters by assigning each
example to closest centroid
compute k new centroids by averaging
examples in each cluster
if centroids don’t change:
break

  best = kMeans(points)
for t in range(numTrials):
C = kMeans(points)
if dissimilarity(C) < dissimilarity(best):
best = C
return best

`

### Wrapping Up Machine Learning

• Use data to build statistical models that can be used to
• Shed light on system that produced data
• Make predictions about unseen data
• Supervised learning
• Unsupervised learning
• Feature engineering
• Goal was to expose you to some important ideas
• Not to get you to the point where you could apply them
• Much more detail, including implementations, in text