Understanding Regularization in Machine Learning

We all know about algorithms in ML that help us to run and predict the output or the target values. But sometimes it can perform very poorly because of the problem known as overfitting of data. What I will try to do in this blog is to try to make you understand a method called Regularization which we can use to avoid this problem and will help you in making predictions accurately.

Underfitting

Underfitting occurs when a model is too simplistic to capture the underlying patterns in the data. It results in poor performance on both the training and test datasets. Essentially, an underfit model cannot learn the complexities present in the data, and its predictions lack accuracy.

Overfitting

Overfitting, as mentioned earlier, happens when a model becomes overly complex, fitting the training data perfectly but failing to generalize to new data. This leads to a model that performs exceptionally well on the training dataset but poorly on unseen data.

To avoid overfit we have various methods:-

Collecting relevant data.
Try selecting and using only a subset of the feature. (Feature Scaling)
Reduce the size of the parameters using regularizations.

In this blog, we're going to talk about Regularization.

Regularization

Regularization is a technique used to avoid overfitting in machine learning models. By adding a penalty term to the loss function, regularization techniques aim to reduce the complexity of the model and improve its ability to generalize to unseen data. In simple terms what regularization does is let you keep all the features of the dataset, but it just prevents some irrelevant features from having a large effect on the whole prediction.

Cost Function of Regularization

The primary purpose of the cost function is to quantify how well or poorly a machine learning model is performing on a given dataset. It measures the difference between the model's predictions and the actual target values (ground truth) in the training data. The goal is to minimize this error.

$$J(θ) = (1/2m) * Σ(yi - h(xi))^2 + λ * Σ|θj|$$

In this equation:

J(θ) is the cost function.
Σ represents the sum over all training examples (i = 1 to m) and all features (j = 1 to n).
yi is the actual output for the ith training example.
h(xi) is the predicted output for the ith training example using the model.
θj represents the model's coefficients (weights).
λ is the regularization parameter, which controls the strength of the regularization. A larger λ leads to stronger regularization.

Types of Regularizations

The main regularization techniques are:

L2 regularization (Ridge Regression): Adds a penalty term that is the sum of the squares of the weights. This has the effect of shrinking the weights towards zero but not setting any weights exactly to zero.
L1 regularization (Lasso Regression): Adds a penalty term that is the sum of the absolute values of the weights. This tends to set some weights exactly to zero, effectively performing feature selection.
Elastic Net: Combines L1 and L2 regularization by adding both penalty terms to the loss function. It has two hyperparameters: α which controls the amount of regularization and l1_ratio which balances the L1 and L2 terms.

We can implement regularization techniques in Python using sci-kit-learn. Here is an example of Ridge Regression:

  from sklearn.linear_model import Ridge

  ridge = Ridge(alpha=0.5)  
  ridge.fit(X_train, y_train)

Here, alpha controls the strength of regularization. A higher alpha value means more regularization and less overfitting.

We can compare the performance of a regularized model versus an unregularized one:

  Linear Regression-Training score: 0.95
  Linear Regression-Test score: 0.61

  Ridge Regression-Training score: 0.90       
  Ridge Regression-Test score: 0.76

We see that Ridge Regression has a lower training score but a significantly higher test score, indicating it is less overfit.

In summary, regularization techniques help machine learning models generalize better by reducing their complexity. L2 regularization is a good choice in most cases, while L1 regularization can also perform feature selection. Elastic Net combines the benefits of both.

Conclusion

In the world of machine learning, regularization stands as a crucial tool in our arsenal. It helps us strike the right balance between model complexity and generalization, preventing overfitting. By introducing penalty terms into our models, regularization encourages simplicity, making our models more robust and reliable in real-world scenarios.