Whenever solving a machine learning problem, It is very crucial to answer the below questions:
- What kind of data I have?
- What is my business problem?
- Which type of solution/approach will work here?
And if you want to efficiently answer the third question. You need to have an in-depth understanding of the differences among different machine learning approaches.
Broad categories of Machine learning are:
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
If you are new to machine learning, I would suggest you go through the blog – Introduction to Machine Learning. As this will help build up a base for understanding the differences better. As this blog primarily focuses on Supervised vs Unsupervised Learning, if you want to read more about the types, refer to the blogs – Supervised Learning, Unsupervised Learning.
Supervised Learning – Supervising the system by providing both input and output data. So the system learns the relationship between the input and the output data.
Unsupervised Learning – System plays around with unlabeled data and tries to find the hidden patterns and features from the data.
Reinforcement Learning – System(agent in ML lingo) has an environment and a goal to achieve. It is then rewarded or penalized on every action it performs pertaining to the goal. In the end, the agent tries to maximize its reward in achieving the goal.
Now, coming over to differences in these approaches. If you are in a hurry, I have summarized the differences between supervised, unsupervised, and reinforcement learning below. But I would highly recommend you go through the rest of the blog to get your understanding right pertaining to the differences.
Supervised vs Unsupervised Learning
|In terms of||Supervised Learning||Unsupervised Learning||Reinforcement Learning|
|Application||Model a relation between input and output variables||Model patterns which might be hidden or to learn more about the data and its underlying structure.||System (Agent) tries to perform actions to achieve a predefined goal and learns using a reward-penalty feedback system.|
|Data||Input data and the corresponding output data is given||Only input data is given||Agent is given a environment and a goal to achieve.|
|Data Preparation||Considerable manual effort is put in labeling of the data||No manual labeling effort is required||Environment preparation is needed, no external data is provided|
|Computational Complexity||Algorithms range from less to very computationally complex algorithms||Fairly less complex||Depending on the goal to achieve reward functions can get very complex at times.|
|Accuracy||Given that the features used are optimal, they give very accurate and reliable results.||Results can be inaccurate sometimes and moderately accurate most of the times.||Reward function needs to be created carefully as system can learn to manipulate the reward function and achieve higher rewards without fulfilling the goal.|
|Commonly Used Algorithms||Decision Trees, Support Vector Machines, Logistic Regression, Random Forests etc.||K-means Clustering, Principal Component Analysis, K-Nearest Neighbors etc.||Monte-Carlo, Q-Learning, SARSA, A3C etc.|
Let’s discuss each of these factors in detail…
Supervised Learning – It is mostly used for prediction tasks where we need to map a relationship between input and output data.
It is further divided into Classification and Regression problems, where input columns are mapped to discrete output columns or continuous output columns respectively.
Some common algorithms include Linear Regression, Logistic Regression, Support Vector Machines, Random Forest, Decision Trees, etc.
Unsupervised Learning is used to explore the hidden patterns in the data(exploratory analysis) or reduce the dimensionality of the data.
Clustering and association analysis is done depending on the data.
Some common unsupervised algorithms include k-means clustering, autoencoders, Principal component analysis, K-Nearest Neighbors.
Data Input and Preparation
Supervised Learning – Data is provided with both input and output labels. This can sometimes cause issues as the training primarily depends only on the labeled data.
In case there is any manual error or issue with the labeled data, it sure shot reflects in the performance of the model. Sometimes the model learns relationships, which do not hold true in the real world and hence affect the quality of the prediction.
Also, with ever-growing data, it becomes difficult to label the data. Although there are some paid services like Amazon Mechanical Turk, not everyone invests in the labeling of the data.
Unsupervised Learning – Data is provided with only input data, no labels are provided explicitly.
This removes the dependency on incorrectly labeled data or any need to label the data for that matter.
Supervised learning – This is one of the factors a data scientist needs to assess carefully while building on a supervised learning algorithm. The complexity of the model depends totally on the nature of the data.
Usually, a small amount of data fits well on low-complexity models, as high complexity models tend to overfit the data. Hence, it makes it difficult to generalize the model if we use a high-complexity model every-time.
Some algorithms can get very complex as more data is put into the model like Neural Networks. Generally, machine learning models are a black box i.e. its hard to explain their internal working. It can be very daunting to explain “Why the model predicted this?”. Although Explainable AI or Machine Learning Interpretability helps in answering such questions.
Unsupervised Learning – It is mostly used to analyze and reduce the data and hence the model complexity for algorithms tends to be less complex.
💡 The trouble with not having a goal is that you can spend your life running up and down the field and never score. -Bill Copeland
Supervised Learning – As we already have the defined classes and labeled training data, the system tends to map the relationship between the variables to achieve the labeled class. Due to this, the predictions by supervised learning algorithms are deemed to be more trustworthy. They also give better accuracy over the models.
Unsupervised Learning – Comparing to supervised learning unsupervised learning algorithms produce less accurate results because there are no defined labels to compare the predictions or assess the results.
Which is better? – Supervised vs Unsupervised learning
I have found people debating on this question numerous times – “Which is better – Supervised or Unsupervised Learning?”
This is a very problem-specific question. As we can’t say that supervised learning is always better than unsupervised learning or vice-versa.
After considering the problem statement and the factors we discussed above we can suggest that in some cases it makes sense to implement supervised algorithms and in others, unsupervised learning algorithms are the best choice.
For instance, a supervised learning approach may work better if we want to predict real estate prices. Whereas an Unsupervised Learning approach may work better if we want to cluster the real estates as per customer’s needs.
It is equally important to test your understanding before implementing things and quizzes are a fun way to do it. So, do give this blog quiz a try.
Supervised vs Unsupervised Learning Quiz
Amazing! I hope next time you see any data, you will be able to decide efficiently on the machine learning algorithm required to solve your problem. In any case, do comment if you have any doubts or any additional points which I might have missed.
Understanding the difference is the first step, but you must understand different algorithms(like Linear Regression, K-Means, etc.) in these categories to become the best in the industry. I would recommend you to read the Linear Regression blog, where I code a Linear Regression model from scratch in python.
🙏🏼 Until next time! Keep Learning and Keep Hustling!