Results for ""
What is ML?
Machine Learning is the science (and art) of programming computers so they can learn from data.
For example, your spam filter is a machine learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (non-spam, also called “ham”) emails.
Why ML?
Finally, ML can help humans learn. ML algorithms can be inspected to see what they have learned. In the case of emails, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam.
Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called data mining.
Types of MLAttribute vs Feature: In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage =15,000”).
1. Supervised Learning
In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels.
Applications: Classification & Regression( Price Prediction)
Algorithms:
2. Unsupervised Learning
As it is evident from the name, the training data is unlabeled. The system tries to learn without a teacher.
Applications: Dimensionality Reduction & Clustering
Algorithms:
Association Rule Learning: The goal is to dig into large amounts of data and discover interesting relations between attributes.
Some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam).Some neural network architectures can be unsupervised. Such as autoencoders and restricted Boltzmann machines. They can also be semi-supervised, such as in deep belief networks and unsupervised pretraining.
3. Semi-supervised Learning
Some algorithms can work with partially labeled data.
For example, in Google Photos, it automatically recognizes the person, but you have to label the person to know the exact name of the person next time.
Most semi-supervised learning algorithms are combinations of supervised and unsupervised learning. e.g., deep belief networks (DBNs) are based on supervised components called restricted Boltzmann machines (RBMs) stacked on top of one another. RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.
4. Reinforcement Learning
The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.
A policy defines what action the agent should choose in a given situation.
Another criterion used to classify ML systems is whether or not the system can learn incrementally from a stream of incoming data.
5. Batch or Offline Learning
The system is incapable of learning incrementally; it must be trained using all the available data.
If we want this system to learn about new data, then we have to train the whole system from scratch.
6. Online Learning
You train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly as it arrives.
It is great for systems that receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly or autonomously.
Having a good performance measure on the training data is good, but insufficient; the true goal is to perform well on new instances. There are two main approaches to generalization: instance-based learning and model-based learning.
7. Instance-Based Learning
The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them) using a similarity measure.
A (very basic) similarity measure between two emails could be to count the number of words they have in common.
8. Model-Based Learning
Another way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions.
Typical ML Project Components
Main Challenges of ML algorithms
Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.
Testing and Validating
The only way to know how well a model will generalize to new cases is to actually try it out on new cases.
It is common to use 80% of the data for training and hold out 20% for testing. However, this depends on the size of the dataset: if it contains 10 million instances, then holding out 1% means your test set will contain 100,000 instances; that’s probably more than enough to get a good estimate of the generalization error.
NO Free Lunch Theorem
In a famous 1996 paper,11 David Wolpert demonstrated that if you make absolutely no assumption about the data, then there is no reason to prefer one model over any other. This is called the No Free Lunch (NFL) theorem. For some datasets the best model is a linear model, while for other datasets it is a neural network. There is no model that is a priori guaranteed to work better (hence the name of the theorem).
Hands-On Machine Learning with Scikit-Learn Keras and Tensorflow : Concepts Tools and Techniques to Build Intelligent Systems. Second. O’Reilly; 2019.