### What is GMM and Agglomerative clustering?

A Gaussian mixture is a statistical model that assumes all the data points are generated from a linear combination of multivariate Gaussian distributions. This assumption has unknown parameters that can be estimated from the data, which we refer to as hyperparameters. Firstly, K-means employs the Gaussian distributions and centers of latent Gaussians. However, unlike K-means, the covariance structure of the distributions are also taken into account. The algorithm uses the EM algorithm to iteratively find the distribution parameters that maximize a model quality measure called log-likelihood. The key steps performed in this model are:

1. Initialize ~~{k y} gaussian distributions
2. Equally weight each point and find probability of that the point is associated with distribution
3. Please recalculate the distribution parameters based on the probability associated with each point.
4. Repeat process until the log-likelihood is maximized.

There are 4 options for calculating covariances in GMM:

1. Full: “Each distribution has its own general covariance matrix” is true for
2. Tied: All distributions share a general covariance matrix.
3. Diag: Generally speaking, each distribution has its own covariance matrix.
4. Spherical: Every distribution has its own individual variance

We have to make decisions about the covariance type, as well as the number of clusters in a model. BIC score, Silhouette score, Calinski Harabasz score and Davies Bouldin are used for selecting both parameters with grid search.

Agglomerative clustering is a family of clustering algorithms that builds nested clusters successively. This hierarchy of clusters can be represented as a tree diagram known as a dendrogram. The top of the tree contains all data points while the bottom gives you individual points. You can link data points together in a successive manner with:

• Single linkage: The distance between the two clusters is minimized by centering the covariance matrix in these observations. around the central value for each cluster. A final multivariate distance matrix is then calculated between these two sets of clusters.
• Complete or Maximum linkage: Uses the same measure of cluster compactness as the Fowlkes-Mallows criterion, but with absolute maximum distance instead of the sum of squared distance Closeness is the minimum number of clusters that can be placed next to a single point without crossing clusters.
• Average linkage: (Method) The distance between each observation and the average of clusters is minimized.
• Ward: For our exercise, we will be using hierarchical clustering which is an extension of the k-means algorithm. Minimizing the sum of squared differences between all clusters is conceptually similar but this software requires you to specify levels.