Exploring Derivatives in Practice

Context 📚

After completing a Linear Algebra course, I transitioned to studying Calculus. Unexpectedly, at work, I found myself contemplating derivatives and integrals. This sparked an idea—why not apply derivatives in practice? 🤔

I had heard that we can compute the gradient (essentially the derivative of the loss function with respect to the parameters) and follow it to reach the minimum. So, about a week ago, I decided to implement a simple Linear Regression model using Gradient Descent.

Main 🔧

I successfully implemented Gradient Descent for a Linear Regression model! 🏆 The algorithm works, but it still needs optimization. There’s room for improvement, and I’ll be refining it as my understanding of the topic grows. Stay tuned for updates! 🔄

Challenges ⚠️

One of the unexpected challenges I encountered was the differing convergence rates of the parameters: the slope ( k ) was converging much faster than the intercept ( b ). It took considerable time to understand the underlying cause and devise a solution.

The issue arises because ( k ) has a stronger relationship with the function, as it is directly scaled by ( x ) when calculating the gradient. This means that even a small change in ( k ) significantly affects the loss function. As we update ( k ) to reduce the loss, the loss decreases, leading to smaller updates in subsequent iterations. This poses a problem for ( b ), as ( k ) minimizes the loss quickly, often before ( b ) has a chance to adjust substantially. Consequently, the step size for ( b ) diminishes rapidly.

To address this, possible solutions include:

implementing an optimizer
increasing the number of epochs
setting appropriate initial learning rates for both ( k ) and ( b )

Results 📉

Here’s the link to my Jupyter Notebook on GitHub: GitHub Link

Below, you can see the results and relevant graphs from the training:

Observations 👀:

The model was trained over 450 epochs.
The final loss is 0.8027, which is expected since the dataset includes noise with a standard deviation of 1 and a mean of 0.
The model successfully follows the gradient and reduces the loss over time.

More improvements coming soon! 🔥

1. Gradient Descent: My First Implementation🚀

Table of contents