Supervised, Unsupervised and Reinforcement Learning
Supervised learning - the training data you feed to the algorithm includes the desired solutions, called labels
- Classification is a typical supervised learning task
- Trained with many examples along with their class and it must learn how to classify new examples
- Regression predicts a target numeric value given a set of features called predictors
- Feature means an attribute plus its value
- Ex: milage = 15,000
- Feature means an attribute plus its value
- List of most important supervised learning algorithms
- K-nearest neighbors
- Linear regression
- Logistic regression
- Support vector machines (SVMs)
- Decision Trees and Random Forests
- Neural Networks
Unsupervised Learning - training data is unlabeled and the system learns without a teacher
- Most important unsupervised learning algorithms
- Clustering
- k-Means
- Hierarchical Cluster Analysis (HCA)
- Expectation Maximization
- Visualization and Dimensionality reduction
- Principal Component Analysis (PCA)
- Locally-Linearly Embedding (LLE)
- T-distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning
- Apriori
- Eclat
- Clustering
- Dimensionality reduction - simplify the data without losing too much information
- Often a good idea to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm
- Anomaly detection - the system is trained with normal instances and it determines if a new instance is normal or an anomaly
- Association rule learning - discover relationships between attributes
- People who buy barbecue sauce and chips also buy steak
Semisupervised learning - partially labeled training data, little labeled and lot unlabeled
- Google photos know people in photo 1, 5 and 11 are the same, need you to label them
Reinforcement Learning
- The learning system is called an agent and can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards).
- It learns by itself to determine the best strategy, called a policy
Batch vs Incremental Learning
Batch - it must be trained using all the available data
- Also called offline learning
Incremental Learning - you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches
- How fast should it adapt to changing data?
- This is called the learning rate
- Higher learning rate will rapidly adapt to new data but quickly forget old data
- Low learning rate the system will have more inertia
- Big challenge - if bad data is fed to the system the performance will gradually decline
- Requires system to be monitored and turning learning off if there is a drop in performance
Instance-Based vs Model-Based Learning
Instance-Based - the system learns the examples by heart, then generalizes to new cases using a similarity measure
Model-Based - Build a model of examples to generalize from then use this model to make predictions
Data Sampling/Processing
Nonrepresentative Training Data
- If sample is too small, you will get sampling noise - non representative data as a results of chance
- If sampling method is flawed, you will get sampling bias
Poor-Quality Data
- If some instances are clearly outliers, may be good to discard or fix the errors manually (incorrect data)
- Missing a few features, you can decide to ignore this feature, ignore those instances, impute the missing values, or train one model with the feature and one without it, or so on.
Irrelevant Features
- Garbage in, garbage out
- Feature Engineering
- Feature selection - selecting the most useful features to train on amount existing features
- Feature extraction - combining existing features to produce more useful one
- Creating new features by gathering new data
Overfitting
Overfitting the Training Data
- The model performs well on the training data but does not generalize it well (or perform on test/new data)
- Detecting patterns from the ’noise'
- Happens when the model is too complex relative to the amount and noisiness of the training data
- Solutions include:
- Simplify the model by fewer parameters
- Gather more training data
- Reduce the noise (fix data errors and remove outliers)
- Solutions include:
Regularization
Regularization
- Constraining a model to make it simpler and reduce the risk of overfitting
- The amount of regularization to apply during learning is controlled by a hyper parameter
- A hyper parameter s a parameter of a learning algorithm (not of the model)
- Must be set prior to training and remains constant during training
Undercutting the Training Data
Undercutting the Training Data
- Model is too simple to learn the underlying structure of the data
- Reality is more complex than the model so predictions are inaccurate
- Can be fixed by:
- Selecting a more powerful model, with more parameters
- Feeding better features to the learning algorithm (feature engineering)
- Reducing the constraints on the model (reducing the regularization hyperparameter)
Testing and Validating
Testing and Validating
Split data into training set and test set
- Error rate on new cases is called generalization error (or out-of-sample error)
- If training error is low and out-of-sample error (or test error) is high, its overfitting
- A second holdout set called the validation set
- Train models with various hyper parameters using training set
- Select model and hyper parameters that perform best on validation set
- Test against test set to get estimate of generalization error
- Common technique is cross validation
- The training set is split into complementary subset and each model is trained against a different combination of these subsets and validated again the remaining parts
Chapter 1 Summary
SUMMARY
- Machine learning is making machines get better at some task by learning from data instead of having to explicitly code rules
- Many types of ML systems
- Feed training set to a learning algorithm
- If model-based, it tunes some parameters to fit the model to the training set then it will make good predictions on new cases
- If instance-based, it learns the examples by heart and uses a similarly measure to generalize to new instances
- System will not perform well if your training set is:
- Too small
- Not representative
- noisy
- Polluted with irrelevant features
- Model needs to be neither too simple nor too complex