What is ML?

Machine Learning is the science (and art) of programming computers so they can learn from data.

For example, your spam filter is a machine learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (non-spam, also called “ham”) emails.

Why ML?

  • Problems for which existing solutions require a lot of hand-tuning or long lists of rules: One machine learning algorithm can often simplify code and perform better.
  • Fluctuating environments: a machine learning system can adapt to new data. For example, if spammers notice that all their emails containing “4U” are blocked, they might start writing “For U” instead. In this case, it would be difficult to design the algorithm again traditionally, whereas the ML approach will learn by itself the new trend too.
  • Complex problems for which there is no good solution at all using a traditional approach: the best machine learning techniques can find a solution. For example, consider speech recognition. Say you want to start simple and write a program capable of distinguishing the words “one” and “two.” Obviously, this technique will not scale to thousands of words spoken by millions of very different people in noisy environments and in dozens of languages. Here, we require an algorithm that learns by itself.
  • Getting insights about complex problems and large amounts of data

Finally, ML can help humans learn. ML algorithms can be inspected to see what they have learned. In the case of emails, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam.

Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called data mining.

Types of MLAttribute vs Feature: In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage =15,000”).

1. Supervised Learning

In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels.

Applications: Classification & Regression( Price Prediction)

Algorithms:

  • k-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVMs)
  • Decision Trees and Random Forests
  • Neural networks

2. Unsupervised Learning

As it is evident from the name, the training data is unlabeled. The system tries to learn without a teacher.

Applications: Dimensionality Reduction & Clustering

Algorithms:

  • Clustering
  • K-Means
  • DBSCAN
  • Hierarchical Cluster Analysis (HCA)
  • Anomaly and Novelty Detection
  • One-class SVM
  • Isolation Forest
  • Visualization and Dimensionality Reduction
  • Principal Component Analysis (PCA)
  • Kernel PCA
  • Locally-Linear Embedding(LLE)
  • t-distributed Stochastic Neighbor Embedding (t-SNE)
  • Association Rule Learning
  • Apriori
  • Eclat

Association Rule Learning: The goal is to dig into large amounts of data and discover interesting relations between attributes.

Some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam).Some neural network architectures can be unsupervised. Such as autoencoders and restricted Boltzmann machines. They can also be semi-supervised, such as in deep belief networks and unsupervised pretraining.

3. Semi-supervised Learning

Some algorithms can work with partially labeled data.

For example, in Google Photos, it automatically recognizes the person, but you have to label the person to know the exact name of the person next time.

Most semi-supervised learning algorithms are combinations of supervised and unsupervised learning. e.g., deep belief networks (DBNs) are based on supervised components called restricted Boltzmann machines (RBMs) stacked on top of one another. RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.

4. Reinforcement Learning

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.

A policy defines what action the agent should choose in a given situation.

Another criterion used to classify ML systems is whether or not the system can learn incrementally from a stream of incoming data.

5. Batch or Offline Learning

The system is incapable of learning incrementally; it must be trained using all the available data.

If we want this system to learn about new data, then we have to train the whole system from scratch.

6. Online Learning

You train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly as it arrives.

It is great for systems that receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly or autonomously.

Having a good performance measure on the training data is good, but insufficient; the true goal is to perform well on new instances. There are two main approaches to generalization: instance-based learning and model-based learning.

7. Instance-Based Learning

The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them) using a similarity measure.

A (very basic) similarity measure between two emails could be to count the number of words they have in common.

8. Model-Based Learning

Another way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions.

Typical ML Project Components

  • Study the data.
  • Select the model.
  • Train it on training data
  • Finally, apply the model to make predictions on new cases (this is called inference), hoping that the model will generalize well.

Main Challenges of ML algorithms

  • Bad Algorithm
  • Bad Data
  • Insufficient Quantity of Training Data
  • Non-Representative Data: It is crucial to use a training set that is representative of the cases you want to generalize to. This is often harder than it sounds: if the sample is too small, you will have sampling noise (i.e., non-representative data as a result of chance), but even very large samples can be non-representative if the sampling method is flawed. This is called sampling bias.
  • Poor Quality Data: If the data is full of errors, outliers, and noise, it will make it harder for the system to detect the underlying patterns. This can be solved by cleaning the data: 1. Remove the clear outliers. 2. If some instances are missing, ignore them, fill them with relevant values, or train two models with both of the above cases.
  • Irrelevant Features: We can come up with a good set of features to train our model on.
  • Overfitting the Training Data: Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and unfortunately, machines can fall into the same trap if we are not careful. In machine learning, this is called overfitting, which means that the model performs well on the training data but does not generalize well.

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

  • Underfitting the Training Data:It occurs when your model is too simple to learn the underlying structure of the data. For example, a linear model of life satisfaction is prone to underfitting.

Testing and Validating

The only way to know how well a model will generalize to new cases is to actually try it out on new cases.

It is common to use 80% of the data for training and hold out 20% for testing. However, this depends on the size of the dataset: if it contains 10 million instances, then holding out 1% means your test set will contain 100,000 instances; that’s probably more than enough to get a good estimate of the generalization error.

NO Free Lunch Theorem

In a famous 1996 paper,11 David Wolpert demonstrated that if you make absolutely no assumption about the data, then there is no reason to prefer one model over any other. This is called the No Free Lunch (NFL) theorem. For some datasets the best model is a linear model, while for other datasets it is a neural network. There is no model that is a priori guaranteed to work better (hence the name of the theorem).

Sources of Article

Hands-On Machine Learning with Scikit-Learn Keras and Tensorflow : Concepts Tools and Techniques to Build Intelligent Systems. Second. O’Reilly; 2019.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE