Machine learning is the next best thing under the wing of Artificial Intelligence. Following a conversation with Box CEO, Google's chairman Eric Schmidt (2015) stated, "Google's self-driving cars and robots get a lot of press, but the company's real future is in machine learning, the technology that enables computers to get smarter and more personal."
It is growing day by day, and the possibilities remain yet to be fully discovered. This article will cover the following topics to enhance the knowledge of:
Machine Learning (ML)
Machine Learning Algorithms (MLA)
And common usage
This is a quick reminder of what ML is before diving deep into their algorithms. ML works by receiving percepts from its surroundings but is not limited to perform tasks according to those prior percepts.
It grows beyond its given codes and performs freely like a human brain through imitating the cognitive process of learning. Its ability to analyze data and make predictions using statistical ml algorithms and even go further beyond its initial programming makes it a self-evolving system.
In simple words, machine learning algorithms are advanced and self-improving algorithms. They can improve on their own automatically with experience.
It is pure math and logic-based system that learns and adjusts as it gets exposed to more data. This exposure to various data helps it to build its parameter and get smarter without any explicit help. And thus, making ML a self-sufficient system!
The ML algorithms are basically applied through two phases:
A commonly used example to explain the training phase is by taking an orange, for example. Firstly, all the features of the oranges need to be recorded. Features may include their shapes, sizes, color, origin country, etc.
Along with features, output variables need to be recorded. They may include sweetness, juiciness, ripeness, etc of the oranges. All these data help MLA (through classification/regression) to learn and create a link between an average orange's physical characteristics and its quality.
Now, a customer only has to choose the oranges (test data), and the machine will predict whether they are sweet or not according to the prior data fed in the training phase. This is how easy life is getting with the help of MLA. (Upasana, 2020)
The perks of MLA do not end here. One can consistently reinforce new data and improve the accuracy of the output. With more training and data exposure, the systems become fail-proof over time. The above example was about oranges, but it can encompass all fruits and many more variables as well.
There are mainly three types of learning processes that MLA uses. They are:
In simple words, supervised learning is a process where the system tries to learn from previous results. Labeled data (i.e., known data) is fed to the system with their correct outcome. Then the MLA gets trained accordingly to develop a mapping pattern.
It uses methods like classification, logistic regression, naiver bayesian model and decision tree, etc to predict values for unlabeled data. Supervised learning is commonly used for indicating the weather, credit card fraud cases, and so on.
In unsupervised learning, the system is not trained with labeled data and correct outcomes. The system has to find a pattern within the dataset and segregate similar data from the dataset to form different groups. This method is popularly used in market segmentation, feature selection, finding outliers in a dataset, and so on.
This particular learning follows a trial-and-error process. Here the system predicts the outcome, and the agent (the decision-maker) decides whether it is satisfactory or not. Here system accepts an outcome only when the agent chooses to. Until then, it keeps on looking for possible outcomes that may yield the best reward. Reinforcement learning is used in robotics, recommendation system, and so on.
Although many different ML algorithms are used for serving different purposes, here are the common ones:
The idea is to estimate real values (such as cost of cars, number of messages, purchase orders, etc.) based on continuous variables. The goal is to create a link between dependent and independent variables by adding the best fit line. Here, the line is called the regression line, and it represents a linear equation Y=aX+b.
The following scenario will sum up an easy way to explain linear regression. Suppose we placed five random bottles of water in front of a child. Now, we give him a piece of paper and ask him to write down a serial by increasing the order of weight. The catch here is that he/she cannot touch the bottle.
So, what he/she does is look closely and analyze the size of the bottle and how much water does it have. This is a real-life linear regression. The child created a link between the size and water occupancy of the bottles and visually measured them.
Similarly, in MLA, a dataset is fed to the system, and the system establishes a correlation between the dataset. Programmers use R-Code to input the dataset.
Although the name may suggest that logical regression has something to do with logic but, ironically, it does not. It rather works as a classifier. What it actually does is take input and put them in either of the two baskets category or not a category. It is more of a "binomial classifier" and predicts discrete values.
Here's a scenario to better explain logistic regression. Suppose we ask a college student to open a jar of pickle. Now, there are only two outcomes that can happen. He/she will either be able to open it or not. But what if we take a test consisting of a series of 10th-grade mathematical questions. It is 75% likely that he/she will be able to solve them.
On the other hand, if we take the test based on 5th-grade history questions then, he/she will only be 25% likely to answer them. This is what logistic regression does. It provides the likelihood of scenarios.
The decision tree is a supervised learning algorithm. Programmers popularly use it to solve classification problems. It can work with both discrete and continuous dependent variables. So how does it work? It separates the dataset into two or more subsets according to their distinct features or their similarities.
Ultimately, it creates different branches just like a tree and holds them in separation from each other.
Interestingly, naive Bayes classifier takes every feature as an impendent feature with no relation to others in the series or class. This may seem like a naive approach, and coincidently, that is why it is named naive Bayes. It takes into account every property individually to predict a scenario.
A common example is to predict whether players will play or not, depending on the weather. Let's assume there are a total of 14 players. Of them, 9 will play and 5 will not play depending on the weather. Out of 9, 3 will play if it is sunny. And out of 14, 5 will or will not play if it is sunny. Now, the question is, what is the percentage of players willing to play in sunny weather? Naive Bayes can easily predict this complex scenario.