### What are the most important supervised and unsupervised algorithms?

- Naveen
- 0

**Supervised Learning algorithms**:

- K-nearest neighbors
- Linear regression
- Naïve Bayes
- Support vector machines
- Logistic regression
- Decision trees and random forests

**K-nearest neighbors**: K-nearest neighbors is a Machine learning technique which comes under supervised learning. This technique can be used for classification or regression problems. In supervised learning, we need to specify a target value for the output variable(s) and then find the data points that are similar in some way to our input variables. These similar data points are called neighbors of our data point and have been shown to have high probability of producing similar output variables.

**Linear Regression**: Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. When the relationship between two categorical variables is known. This algorithm predicts a target variable based on linear combination of input variables and their respective weights, where the sum of weights equal one. Its goal is to find a linear equation that best describes how the target variable changes with each change in an input variable. The equation uses an intercept and slope to model how the target variable changes when the input variables change.

**Naïve Bayes**: The Naïve Bayes algorithm is an approach that is used for solving classification problems. It’s based on Bayes theorem and is known for its efficiency in solving supervised learning problems. It is primarily used in text classification that contains a high-dimensional training dataset. Naïve Bayes Classifier is one of the simpler and more effective classification algorithms which helps in quickly building machine learning models. It is a probabilistic classifier, which means it predicts on the basis of probability**. **Naive Bayes Algorithm is an algorithm designed to classify and predict things which can help with spam elimination, sentiment analysis, and article classification.

**Support Vector Machine:** Support Vector Machine or SVM is one of the most popular supervised learning algorithms. It’s used primarily for classification problems but can be used for regression as well.

The goal of supervised machine learning is to find the best decision boundary separating two or more categories so we can easily put new data points in their correct category. This best decision boundary is called a hyperplane.

Support Vector Machines create a hyperplane that is defined by extreme points. They are called as support vectors because they provide challenging examples to help train the algorithm. Using a decision boundary or hyperplane helps to classify data based on different variables. The below diagram illustrates this. Examples of classification include: Aircraft Flight Classification Systems, Credit Score Classification Systems, and Product Classification Systems.

**Logistic Regression: **Logistic Regression is one of the most popular Machine Learning algorithms in the sense that it comes under the Supervised Learning technique and is used for predicting the categorical dependent variable using a given set of independent variables.

Logistic regression has the ability to predict a categorical or discrete outcome. This can be helpful if you’re looking to forecast future outcomes, helping you make informed decisions before they happen. It can be either Yes or No, 0 or 1, true or False, etc. Probabilistic values lie between 0 and 1

Logistic regressions are more similar to Linear regressions in terms of how they can be used. Logistic regressions can explore things like Regression problems, but Linear regressions are able to explore things like classification problems.

**Decision Tree:** The Supervised learning technique known as Decision Tree is typically used for Classification problems. The chapter that discusses ids representation is where you find an example of a tree-structured classifier. Internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.

In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches that follow out of them. Leaf nodes are the output of those decisions and do not contain any further branches.

**Random Forest:** Random Forest is showing up more in ML. It provides a high margin of error and can be used for both classification and regression problems It uses ensemble learning, which is a process of combining multiple classifiers to solving complex problems and improve the performance of the model.

As the name suggests, “R f” is a classifier that contains a number of decision trees on various subsets of the given dataset. R f averages out its predictions to achieve improved accuracy. Random forests consist of multiple decision trees that give predictions on different outcomes. This can help account for each tree’s prediction being wrong in some cases, which leads to a complete range of possible results.

**Unsupervised Learning algorithms:**

**Clustering**– K-means, hierarchy cluster analysis**Association rule learning –**Apriori**Visualization and dimensionality reduction –**kernel PCA, t-distributed, PCA

As an example, suppose you have got many data or visitor using of one of this algorithms for detecting groups with similar visitors. It may find that 65% of your visitors are males who love watching movies in the evening, while 30% watch plays in the evening; in this case, by using a clustering algorithm, it’ll divide every group into smaller sub-groups.

There are some vital algorithms, like visualization algorithms; these are unsupervised learning algorithms. You’ll need to give then many data and unlabeled data as an input, and then you’ll get 2D or 3D visualization as an output.

The goal here is to make the output as simple as possible without losing any of the information. To handle this problem. It will combine several related features into one feature: for instance, it’ll combine a car’s make with its model. This is called feature extraction.

**K-Means Cluster:** K-Means Clustering allows for an unsupervised learning algorithm to create clusters from a set of data. They’re typically used in data science or machine learning to improve prediction accuracy. K-Means Clustering is useful in many different contexts: to find clusters of similar objects; for example, finding groups of similar customers or groups of similar documents. to learn associations between data points that are not explicitly labeled by other predefined categories; for example, learning which emails contain the most spam.

**Hierarchical Cluster:** Hierarchical clustering is a machine-learning algorithm that helps companies group their unlabelled datasets into clusters. It’s also known as hierarchical cluster analysis. Processes that work by clustering unlabeled data into groups are often called clustering algorithms.

In this algorithm, we develop the hierarchy of clusters using a tree-like structure. This is known as a dendrogram.

**Apriori algorithm: **The Apriori algorithm uses frequent item sets to generate association rules. It is designed to work on the databases that contain transactions. The association rule algorithm uses a breadth-first search and Hash Tree to calculate the itemset associations quickly and efficiently. In order for KM-HMM to work, you need a dataset of items that are frequently present in the corpus

The algorithm was created by R. Agrawal and Srikant in the year 1994 and is primarily used for industrial market basket analysis or finding those products that can be purchased together. It can also be used in the medical field to find drug reactions for the patient.