Linear Regression for Machine Learning
Linear regression is the statistical technique to find relationship between two or more variables. To predict the values of response (target) variable based on that values of predictors (external / independent variables) we can use linear regression.
Simple linear regression is having only one external factor while Multiple liner regression is having more than one external factor.
Equation in Linear Regression
The equation of line is y = mx + c
m = slope decides the direction of the line.
C = intercept = if values of X variable is zero what will be the value of the y its define by the intercept.
In terms of the multiple linear regression the equation is:
In order to find the best fit line we have to find the best values for B0,B1,….Bn and to find out the best values for B0,B1,B2,….Bn there is method called Ordinary Least Square (OLS).
To perform linear regression, we must aware of its Assumptions.
Assumptions is the certain conditions that should met before building the linear regression model.
1 – Linearity – Linear relationship between dependent and independent variables. We can check the linearity based on the correlation, correlation plot and scatter plot.
2 – Multicollinearity – Multicollinearity refers to correlation between independent variables. Multicollinearity should not be present. We can check multicollinearity by computing the variance influence factor (VIF = 1/1-R2)
3 – Homoscedasticity – Variance of the error or residual should be constant. The error term does not vary much as the values of the independent variable changes. The goldfield-Quandt Test and Breusch-pagan test can be used to test for homoscedasticity.
4 – Normality of residuals – the residuals should follow a normal distribution this assumption can be checked with a histogram or a Q-Q-Plot.
5 – No Autocorrelation – Autocorrelation occurs when the residuals are not independent from each other. We can use Durbin-Watson test to check Autocorrelation.
Let’s implement this in Python
We will implement Linear Regression with Python programming language.
Step 1 – We will start by importing the necessary Python libraries:
Step 2 – We will load the data
Step 3 – Training Linear Regression with Python
To train the linear regression algorithm using the Python, we will first split the dataset into 80% training and 20% test sets:
Step 4 – Now let’s train our model
Step 5 – let’s plot our trained model with the help of matplotlib.
The Linear Regression model is used to test the relationship between two variables in the form of an equation.