Difference between R square and Adjusted R square?
Linear Regression, a machine learning algorithm, is widely used and evaluating its performance is essential. Two important metrics that are used to evaluate Linear Regression are R-squared and Adjusted R-squared. These metrics help determine the degree of the model fit and how much of the variance in the target variable is explained by the independent variables.
To understand R-squared and Adjusted R-squared, the concept of Residual Sum of Squares (RSS) must first be discussed. This concept is the sum of the squared difference between the actual and predicted values, and it determines how well the model fits the data. The regression line is considered the best fit if it minimizes the RSS value. However, comparing RSS values for models with different scales of the target variable can be problematic as RSS is a scale-dependent statistic.
R-squared (R²) is a metric that represents the proportion of variance in the dependent variable that can be explained by the independent variables in the model. The formula for R-squared is R² = 1 – (RSS/TSS), where TSS is the sum of the squared difference between the actual and mean values of the dependent variable. However, the problem with R-squared is that it increases as the number of independent variables in the model increases, even if the variables do not contribute to the model’s predictive power. This may lead to overfitting where the model is too complex and performs poorly on new data.
To avoid the problem of overfitting, Adjusted R-squared (R²_adj) is used instead of R-squared. Adjusted R-squared penalizes the addition of irrelevant variables to the model, and it increases only when the independent variables improve the model’s predictive power. The formula for calculating Adjusted R-squared is R²_adj = 1 – [(1-R²)*((n-1)/(n-p-1))], where n is the number of observations, and p is the number of independent variables in the model. Adjusted R-squared is always less than or equal to R-squared, and it is a better metric for comparing models with different numbers of independent variables.
So understanding R-squared and Adjusted R-squared is essential for evaluating the performance of Linear Regression. R-squared measures the proportion of variance in the dependent variable that can be explained by the independent variables, while Adjusted R-squared penalizes the addition of irrelevant variables to the model. Adjusted R-squared is a better metric for comparing models with different numbers of independent variables.