machine learning - Page 5 of 11

Naveen Pandey
December 16, 2022April 21, 2025

Important Machine Learning Concepts Part – 2

Ensemble Learning Training multiple models with different parameters to solve the same problem. A/B Testing Statistical way of comparing 2+ techniques to determine which technique performs better and also if difference in statistically significant. Baseline Model Simple model/heuristic used as reference point for comparing how well a model is performing. Bias Prejudice or favourite towards…

Naveen Pandey
December 13, 2022April 21, 2025

Important Machine Learning Concepts Part – 1

Features Input data/variables used by the ML model. Feature Engineering Transforming input features to be more useful for the models. e.g., mapping categories to buckets, normalizing between -1 and 1, removing null. Train/Eval/Test Training is data used to optimize the model, evaluation is used to asses the model on new data during training, test is…

Naveen Pandey
December 10, 2022April 21, 2025

What is selection Bias?

Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with research where the selection of participants isn’t random. It is sometimes referred to as the selection effect. It is the distortion of statistical analysis, resulting from the method of collecting samples.…

Naveen Pandey
December 10, 2022December 12, 2024

What is a confusion matrix?

The confusion matrix is a 2×2 table that contains 4 outputs provided by the binary classifier. Various measures, such as error-rate, accuracy, specificity, sensitivity, precision and recall are derived from it. Confusion matrix. A dataset used for performance evaluation is called a test data set. It should contains the correct labels and predicted labels. The…

Naveen Pandey
December 9, 2022December 12, 2024

What is the ROC curve?

The ROC curve is a graph between False positive rate on the x axis and True positive rate on the y axis. True positive rate is the ratio of True positives to the total number of positive samples. False positive rate is the ratio of False positives to the total number of negative samples. The…

Naveen Pandey
December 9, 2022December 12, 2024

What do you understand by true positive rate and false-positive rate?

True Positive rate (TRP) is the ratio of True Positives to True Positives and False Negatives. It is the probability that an actual positive will test as positive. TPR = TP / TP + FN The False Positive Rate (FPR) is the ratio of the False Positives to all the positives (True positives and false…

Naveen Pandey
December 6, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 3

1. What is the significance of sampling? Name some techniques for sampling? For analyzing the data, we cannot proceed with the whole volume at once for large datasets. We need to take some samples from the data which can represent the whole population. While making a sample out of complete data, we should take the…

Naveen Pandey
November 27, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 2

1. Bagging and Boosting Bagging and Boosting are two different ways used in combining base estimators for ensemble learning (Like random forest combining decision trees). Bagging means aggregating the predictions of several weak learners. We can think of it combining weak learners is used in parallel. The average of the predictions of several weak learners…

Naveen Pandey
November 20, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 1

Central Limit Theorem We first need to introduce the normal (gaussian) distribution for central limit theorem to make sense. Normal distribution is a probability distribution that look like a bell. X-axis represents the values and y-axis represents the probability of observing these values. The sigma values represent standard deviation normal distribution is used to represent…

Naveen Pandey
November 20, 2022December 12, 2024

Most Common Feature Scaling methods in Machine Learning

Definition Feature scaling is the process of normalizing the range of feature in a dataset. Real-world datasets often contain features that are varying in degrees of magnitude, range and units. Therefore, in order for machine learning models to interpret these features on the same scale, we need to perform scaling. Feature scaling makes the model…

Naveen Pandey
November 11, 2022April 21, 2025

Stress Detection Project using Machine Learning

Stress, tension, and misery are undermining the psychological well-being of individuals. Each individual has a justification behind having an unpleasant life. Individuals frequently discuss their thoughts via web-based entertainment stages like on Instagram as posts and stories, and on Reddit through requesting ideas about their life on subreddits. In the beyond couple of years, many…

Naveen Pandey
November 6, 2022December 12, 2024

Outlier Detection methods in Machine Learning

Objective An outlier is an individual point of data that is distant from other points in the dataset. It is an anomaly in the dataset that may be caused by a range of errors in capturing, processing or manipulating data. Outliers in the data may cause problem during model fitting as it may inflate the…