Chapter 1

Supervised, Unsupervised and Reinforcement Learning

Supervised learning - the training data you feed to the algorithm includes the desired solutions, called labels

Classification is a typical supervised learning task
- Trained with many examples along with their class and it must learn how to classify new examples
Regression predicts a target numeric value given a set of features called predictors
- Feature means an attribute plus its value
  - Ex: milage = 15,000
List of most important supervised learning algorithms
- K-nearest neighbors
- Linear regression
- Logistic regression
- Support vector machines (SVMs)
- Decision Trees and Random Forests
- Neural Networks

Unsupervised Learning - training data is unlabeled and the system learns without a teacher

Most important unsupervised learning algorithms
- Clustering
  - k-Means
  - Hierarchical Cluster Analysis (HCA)
  - Expectation Maximization
- Visualization and Dimensionality reduction
  - Principal Component Analysis (PCA)
  - Locally-Linearly Embedding (LLE)
  - T-distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning
  - Apriori
  - Eclat
Dimensionality reduction - simplify the data without losing too much information
- Often a good idea to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm
Anomaly detection - the system is trained with normal instances and it determines if a new instance is normal or an anomaly
Association rule learning - discover relationships between attributes
- People who buy barbecue sauce and chips also buy steak

Semisupervised learning - partially labeled training data, little labeled and lot unlabeled

Google photos know people in photo 1, 5 and 11 are the same, need you to label them

Reinforcement Learning

The learning system is called an agent and can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards).
It learns by itself to determine the best strategy, called a policy

Batch vs Incremental Learning

Batch - it must be trained using all the available data

Also called offline learning

Incremental Learning - you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches

How fast should it adapt to changing data?
- This is called the learning rate
- Higher learning rate will rapidly adapt to new data but quickly forget old data
- Low learning rate the system will have more inertia
Big challenge - if bad data is fed to the system the performance will gradually decline
- Requires system to be monitored and turning learning off if there is a drop in performance

Instance-Based vs Model-Based Learning

Instance-Based - the system learns the examples by heart, then generalizes to new cases using a similarity measure

Model-Based - Build a model of examples to generalize from then use this model to make predictions

Data Sampling/Processing

Nonrepresentative Training Data

If sample is too small, you will get sampling noise - non representative data as a results of chance
If sampling method is flawed, you will get sampling bias

Poor-Quality Data

If some instances are clearly outliers, may be good to discard or fix the errors manually (incorrect data)
Missing a few features, you can decide to ignore this feature, ignore those instances, impute the missing values, or train one model with the feature and one without it, or so on.

Irrelevant Features

Garbage in, garbage out
Feature Engineering
- Feature selection - selecting the most useful features to train on amount existing features
- Feature extraction - combining existing features to produce more useful one
- Creating new features by gathering new data

Overfitting

Overfitting the Training Data

The model performs well on the training data but does not generalize it well (or perform on test/new data)
Detecting patterns from the ’noise'
Happens when the model is too complex relative to the amount and noisiness of the training data
- Solutions include:
  - Simplify the model by fewer parameters
  - Gather more training data
  - Reduce the noise (fix data errors and remove outliers)

Regularization

Constraining a model to make it simpler and reduce the risk of overfitting
The amount of regularization to apply during learning is controlled by a hyper parameter
- A hyper parameter s a parameter of a learning algorithm (not of the model)
- Must be set prior to training and remains constant during training

Undercutting the Training Data

Model is too simple to learn the underlying structure of the data
Reality is more complex than the model so predictions are inaccurate
Can be fixed by:
- Selecting a more powerful model, with more parameters
- Feeding better features to the learning algorithm (feature engineering)
- Reducing the constraints on the model (reducing the regularization hyperparameter)

Testing and Validating

Split data into training set and test set

Error rate on new cases is called generalization error (or out-of-sample error)
If training error is low and out-of-sample error (or test error) is high, its overfitting
A second holdout set called the validation set
- Train models with various hyper parameters using training set
- Select model and hyper parameters that perform best on validation set
- Test against test set to get estimate of generalization error
Common technique is cross validation
- The training set is split into complementary subset and each model is trained against a different combination of these subsets and validated again the remaining parts

Chapter 1 Summary

SUMMARY

Machine learning is making machines get better at some task by learning from data instead of having to explicitly code rules
Many types of ML systems
Feed training set to a learning algorithm
- If model-based, it tunes some parameters to fit the model to the training set then it will make good predictions on new cases
- If instance-based, it learns the examples by heart and uses a similarly measure to generalize to new instances
System will not perform well if your training set is:
- Too small
- Not representative
- noisy
- Polluted with irrelevant features
Model needs to be neither too simple nor too complex