Gradient Descent for Linear Regression
Gradient Descent is defined as one of the most commonly used iterative optimization algorithm of machine learning to train the machine learning and deep learning models. It helps in finding the local minima of a function.
The best way to define the local minima or local maxima of a function using gradient descent is as follow:
- If we move towards a negative gradient or away from the gradient of the function at the current point, it will give the local minima of that function.
- Whenever we move towards a positive gradient or towards the gradient of the function at the current point, we will get the local maxima of tat function.
This entire process is known as gradient descent, which is also known as steepest descent. The main objective of using a gradient descent algorithm is to minimize the cost function using iteration.
Gradient descent Working
Before starting the working of gradient descent, we should know some basic concepts to find out the slope of a line from linear regression. The equation for the simple linear regression is given as:
Y = mx + c
Where ‘m’ represents the slope of the line, and ‘c’ represent the intercept on the y-axis.
- At the starting point, we will derive the first derivative or slope and then use a tangent line to calculate the steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias).
- The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated, then steepness gradually reduces, and at the lowest point, which is called a point of convergence.
Role of Learning Rate
The main objective of gradient descent is to minimize the cost function or the error between expected and actual. To minimize the cost function, learning rate plays a crucial role.
Learning Rate is defined as the step size taken to reach the minima or lowest point. This is typically a small value that is evaluated and updated based on the behavior of the cost function.
If the learning rate is high, it results is large steps but also leads to risks of overshooting the minima. At the same time, a low learning rate shows the small step sizes, which compromises overall efficiency but gives the advantage of more precision.