Missing Values Treatment methods in Machine Learning

Delete Missing Value Rows

  • Missing values can be handled by  deleting the rows or columns having null values.
  • If columns have more than half of the rows as null then the entire columns can be dropped.
  • The rows which are having one or more columns values as null can also dropped.

Pros:

  • A model trained with the removal of all missing values creates a robust model as it removes noise from data.
  • Easy to implement.

Cons:

  • Loss of a lot of information.
  • Works poorly if the percentage of missing values is higher.

Impute Missing Values

  • For numerical columns, the missing value can be replaced by mean, median, mode of the remaining value of columns.
  • For categorical columns, the missing value can be replaced by most frequent observation in the columns.
  • The regression & classification algorithm can be trained using remaining data and can be used for imputing missing values.
  • ML algorithm such as KNN, MICE library etc. can be used to impute missing value.

Pros:

  • Preserves all cases by replacing missing data with an estimated value based on the available information.

Cons:

  • Doesn’t work well when the percentage of missing values is higher. Each technique has its own disadvantages, so one must be careful when choosing a technique.

Using Algorithms that support missing values

  • There are some ML algorithms that are robust to missing values in the dataset. For ex. KNN, Random Forest, XGboost.

Pros:

  • No need to handle missing values in each column as ML algorithms will handle them efficiently.

Cons:

  • No implementation of these ML algorithms in the scikit-learn library.

Popular Posts

Author

  • Naveen Pandey Data Scientist Machine Learning Engineer

    Naveen Pandey has more than 2 years of experience in data science and machine learning. He is an experienced Machine Learning Engineer with a strong background in data analysis, natural language processing, and machine learning. Holding a Bachelor of Science in Information Technology from Sikkim Manipal University, he excels in leveraging cutting-edge technologies such as Large Language Models (LLMs), TensorFlow, PyTorch, and Hugging Face to develop innovative solutions.

    View all posts
Spread the knowledge
 
  

Join the Discussion

Your email will remain private. Fields with * are required.