Author: Naveen

Naveen Pandey has more than 2 years of experience in data science and machine learning. He is an experienced Machine Learning Engineer with a strong background in data analysis, natural language processing, and machine learning. Holding a Bachelor of Science in Information Technology from Sikkim Manipal University, he excels in leveraging cutting-edge technologies such as Large Language Models (LLMs), TensorFlow, PyTorch, and Hugging Face to develop innovative solutions.

Naveen Pandey
December 10, 2022December 12, 2024

What is a confusion matrix?

The confusion matrix is a 2×2 table that contains 4 outputs provided by the binary classifier. Various measures, such as error-rate, accuracy, specificity, sensitivity, precision and recall are derived from it. Confusion matrix. A dataset used for performance evaluation is called a test data set. It should contains the correct labels and predicted labels. The…

Naveen Pandey
December 9, 2022December 12, 2024

What is the ROC curve?

The ROC curve is a graph between False positive rate on the x axis and True positive rate on the y axis. True positive rate is the ratio of True positives to the total number of positive samples. False positive rate is the ratio of False positives to the total number of negative samples. The…

Naveen Pandey
December 9, 2022December 12, 2024

What do you understand by true positive rate and false-positive rate?

True Positive rate (TRP) is the ratio of True Positives to True Positives and False Negatives. It is the probability that an actual positive will test as positive. TPR = TP / TP + FN The False Positive Rate (FPR) is the ratio of the False Positives to all the positives (True positives and false…

Naveen Pandey
December 6, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 3

1. What is the significance of sampling? Name some techniques for sampling? For analyzing the data, we cannot proceed with the whole volume at once for large datasets. We need to take some samples from the data which can represent the whole population. While making a sample out of complete data, we should take the…

Naveen Pandey
December 6, 2022December 12, 2024

3 Important Neural Network Architectures Explained

1. Perceptron The perceptron is the most basic of all neural networks, being a fundamental building block of more complex neural network. If simple connects an input cell and an output cell. 2. Feed-Forward Network The feed-forward network is a collection of perceptions’. In which there are three fundamental types of layers – input layers,…

Naveen Pandey
November 27, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 2

1. Bagging and Boosting Bagging and Boosting are two different ways used in combining base estimators for ensemble learning (Like random forest combining decision trees). Bagging means aggregating the predictions of several weak learners. We can think of it combining weak learners is used in parallel. The average of the predictions of several weak learners…

Naveen Pandey
November 27, 2022December 12, 2024

What is the Normal Distribution?

Probability distribution is the function that shows the probabilities of the outcome of an event or experiment. Consider a feature (i.e., column) in a dataframe. This feature is a variable and its probability distribution function shows the likelihood of the values it can take. Probability distribution function are quite useful in predictive analytics or machine…

Naveen Pandey
November 26, 2022December 12, 2024

Important Deep learning Concept Explained Part – 2

Converge Algorithm that converges will eventually reach an optimal answer, even if very slowly. An algorithm that doesn’t converge may never reach an optimal answer. Learning Rate Rate at which optimizers change weights and biases. High learning rate generally trains faster but risks not converging whereas a lower rate trains slower. Numerical instability Issues with…

Naveen Pandey
November 26, 2022December 12, 2024

Important Deep learning Concept Explained Part – 1

Neuron Node is a NN, typically taking in multiple input values and generating one output value by applying an activation function (nonlinear transformation) to weighted sum of input values. Weights Edges is a NN, the goal of training is to determine the optimal weight for each feature; if weight = 0, corresponding feature does not…

Naveen Pandey
November 23, 2022December 12, 2024

Russia-Ukraine War Data Analysis Project using Python

In this article I will take you through the task of Analyzing the Russia-Ukraine war Dataset using Python. The dataset that I am using for the task of analysis the Ukraine and Russia War is downloaded from Kaggle. You can download russia-ukraine equipment dataset from here and russia-ukraine personnel losses dataset from here. Now let’s import…

Naveen Pandey
November 20, 2022December 12, 2024

3 Concepts Every Data Scientist Must Know Part – 1

Central Limit Theorem We first need to introduce the normal (gaussian) distribution for central limit theorem to make sense. Normal distribution is a probability distribution that look like a bell. X-axis represents the values and y-axis represents the probability of observing these values. The sigma values represent standard deviation normal distribution is used to represent…

Naveen Pandey
November 20, 2022December 12, 2024

Most Common Feature Scaling methods in Machine Learning

Definition Feature scaling is the process of normalizing the range of feature in a dataset. Real-world datasets often contain features that are varying in degrees of magnitude, range and units. Therefore, in order for machine learning models to interpret these features on the same scale, we need to perform scaling. Feature scaling makes the model…