What is L1 and L2 regularization in Deep Learning?
L1 and L2 regularization are two of the most common ways to reduce overfitting in deep neural networks. L1 regularization is performing a linear transformation on the weights of your neural network. L2 regularization is adding a squared cost function to your loss function. This cost function penalizes the sum of the absolute values of weights.
L1 regularization is a technique that penalizes the weight of individual parameters in a model. It is also called weight decay, and it’s a way to prevent overfitting by penalizing models for having too many parameters. The L1 regularization approach is widely used in statistical learning algorithms, such as gradient-based optimization methods like the Adam optimizer in scikit-learn. Using L1 regularization, we can make our model more precise by reducing the number of parameters which are not important for classification accuracy. This is because L1 is less sensitive to noise. I hope that explanation helps!
L2 regularization is an alternative technique that penalizes the sum of squares of all parameters in a model. We consider the regression problem with formula_1 features, where formula_2 are independent and identically distributed normal variables with mean zero and standard deviation one. The linear regression model is given by: formula_3The target value is the mean of the two independent variables. The regularizer term formula 4 penalizes the sum of squares of all parameters, so that: formula 5We can apply L2 regularization by centering our features and then subtracting a constant from each feature, as shown in Figure 2.Figure 2: Linear Regression with L2 Regularization The final model is given by: formula 6where formula 7If we have a linear regression problem with formula 8 features, where the target value is the mean of the five independent variables, then our formulation for L2 regularization would be: formula 9The target value for L2 regularization is proportional to where f(x) is the cost function.