Regularisation

L1 and L2 Regularization in Machine Learning

Summary: L1 and L2 regularisation prevent overfitting in machine learning models. L1 regularisation helps feature selection by setting coefficients to zero, while L2 regularisation offers robust, non-sparse solutions.

Introduction

Machine Learning is the area where computers are given the ability to learn without being explicitly programmed. Accordingly, it is one of the most exciting technologies that allows computers to think and work like humans, especially having the ability to learn. 

One of the most effective strategies in Machine Learning is Regularisation, which avoids overfitting. Effectively, Overfitting occurs when a model fits for data training too easily and is also complicated, so it fails to function adequately. Regularisation is then used as a penalty for the model’s loss function. 

Moreover, there are two different types of Regularizations, L1 and L2 Regularisation in Machine Learning, which will be discussed in the blog post.

Must Read: Regression in Machine Learning: Types & Examples.

What is Regularisation in Machine Learning?

Regularisation is the approach in Machine Learning that prevents overfitting by ensuring that a penalty term is included within the model’s function. There are two main objectives of Regularisation include-

  • To reduce the complexity of a model.
  • To improve the ability of the model to generalise new inputs.

Numerous regularisation methods add different penalty terms, including L1 and L2 regularisation. While L2 Regularisation is a punishment term based on the squares of the given parameters, L1 is a penalty term for absolute values of the model’s parameters. 

Certainly, Regularisation reduces the chances of overfitting and controls the model’s parameters. Therefore, it helps enhance the model’s performance on untested data.

What is L1 Regularisation?

L1 Regularisation, or Lasso Regularisation, is a Machine Learning strategy that inhibits overfitting by introducing a penalty term in the model’s loss function. The penalty term is based on the absolute values of the model’s parameters. 

L1 Regularisation tends to reduce the parameters of some models to zero to lower the number of non-zero parameters in the model.

 L1 Regularisation

L1 Regularisation is useful when working with high-dimensional data. It enables you to choose a subset of the most essential attributes. Furthermore, it helps reduce the risk of overfitting and makes the model easier to understand. Hyperparameter lambda regulates the strength of L1 regularisation by controlling the size of the penalty term. 

Thus, improvement in regularisation occurs when lambda rises, and the parameters are reduced to zero.

The L1 Regularisation formula is given below: 

What is L2 Regularisation?

L2 Regularisation, also known as Ridge Regularisation, is an approach in Machine Learning. It avoids overfitting by executing penalty terms in the model’s loss functions on the squares of the model’s parameters. The primary goal of L2 Regularisation is to ensure that the model’s parameters have short sizes and prevent oversizing.

L2 Regularisation

For L2 Regularisation, the term proportionate to the squares of the model’s parameters is added to the loss function. It works by limiting the size of the parameters and preventing them from growing out of control. The hyperparameter lambda, which controls the Regularisation’s intensity, also ensures the size of the penalty term is controlled. Hence, the parameters will be smaller, and the Regularisation will be stronger with the greater lambda.

The L2 Regularisation formula is given below: 

Explore More:
How to build a Machine Learning Model?

Data Quality in Machine Learning.

Differences Between L1 and L2 Regularisation

While L1 and L2 regularisation aims to mitigate overfitting by adding a penalty to the model’s parameters, they do so differently, leading to unique impacts on the model’s performance and structure. Understanding these differences is crucial for selecting the appropriate regularisation technique based on the specific requirements of your machine learning task.

L1 Regularisation

L1 regularisation, also known as Lasso (Least Absolute Shrinkage and Selection Operator), is characterised by adding the absolute values of the model parameters as a penalty term. This approach has several distinctive features and implications:

  • Penalty Term: Based on the absolute values of the model parameters.
  • Sparse Solutions: Some parameters are reduced to zero, producing sparse solutions.
  • Sensitivity to Outliers: More sensitive to outliers compared to L2 regularisation.
  • Feature Selection: Selects a subset of the most crucial features, effectively performing feature selection.
  • Non-convex Optimisation: Typically involves non-convex optimisation, which can be more challenging.
  • Insensitive to Correlated Features: The penalty term is less sensitive to correlated features.
  • Dimensional Data: Useful when dealing with dimensional data, especially in high-dimensional spaces.
  • Also Known As: Commonly referred to as Lasso regularisation.

L2 Regularisation

L2 regularisation, also known as Ridge regularisation, involves adding the squares of the model parameters as a penalty term. This technique offers a different set of advantages and characteristics:

  • Penalty Term: Based on the squares of the model parameters.
  • Non-sparse Solutions: Uses all the parameters, producing non-sparse solutions.
  • Robustness to Outliers: More robust to outliers than L1 regularisation.
  • Feature Utilisation: All features are helpful for the model, contributing to the final predictions.
  • Convex Optimisation: Involves convex optimisation, which is easier to solve compared to non-convex optimisation.
  • Sensitivity to Correlated Features: The penalty term is highly sensitive to correlated features, affecting model performance.
  • High-Dimensional Data: Useful for high-dimensional data and when the goal is to have a less complex model.
  • Also Known As: Commonly referred to as Ridge regularisation.

Tabular Representation of Key Differences between L1 and L2

As you know, understanding the critical differences between L1 and L2 regularisation helps choose the proper method for specific machine learning problems. For your clear understanding, let’s look at a tabular representation of the critical differences between L1 and L2 regularisation methods.

Practical Applications of L1 and L2 Regularisation

The choice between L1 and L2 regularisation hinges on the specific characteristics and requirements of the machine learning problem. Each method has unique advantages and is suited to different scenarios. Here are some practical considerations to guide the selection process:

The choice between L1 and L2 regularisation depends on the specific characteristics and requirements of the machine learning problem. Here are some practical considerations for each technique:

When to Use L1 Regularisation

L1 regularisation is ideal for scenarios where feature selection and sparsity are essential. Here are some specific cases where L1 regularisation excels:

  • Feature Selection: L1 regularisation is highly effective when identifying and retaining essential features. It reduces the number of features by setting some coefficients to zero, effectively performing feature selection.
  • Dimensional Data: When dealing with datasets with many features, L1 regularisation can help manage dimensionality. Producing sparse solutions simplifies the model and makes it easier to interpret.
  • Irrelevant or Redundant Features: If you suspect that a subset of the features in your dataset is irrelevant or redundant, L1 regularisation can help. It tends to shrink the coefficients of less important features to zero, thus removing them from the model and improving performance.

When to Use L2 Regularisation

L2 regularisation is suitable for scenarios where the use of all features and robustness to outliers are crucial. Here are some specific cases where L2 regularisation is beneficial:

  • All Features Relevant: When you believe all features in your dataset contribute to the outcome, L2 regularisation is appropriate. Unlike L1, it does not shrink any coefficients to zero, ensuring that all features are included in the model.
  • Robustness to Outliers: L2 regularisation is more robust to outliers than L1. If your dataset contains outliers that could significantly influence the model, L2 regularisation can help mitigate their impact, leading to more stable and reliable predictions.
  • High-Dimensional Data and Less Complex Models: When dealing with high-dimensional data where the goal is to reduce model complexity, L2 regularisation is useful. It helps to smooth the coefficients, resulting in a simpler and more generalisable model.

After carefully considering your machine learning problem’s specific needs, you can choose between L1 and L2 regularisation to achieve optimal performance and model interpretability.

Further Learn: 

Learn about the Probabilistic Model in Machine Learning.

Discover Best AI and Machine Learning Courses For Your Career.

Frequently Asked Questions

What is the difference between L1 and L2 regularisation?

L1 regularisation, known as Lasso, adds a penalty equal to the absolute value of the coefficients, resulting in sparse solutions by setting some parameters to zero. L2 regularisation, known as Ridge, penalises the square of the coefficients, keeping all parameters but shrinking them, offering robustness to outliers.

Why is regularisation important in machine learning?

Regularisation is crucial in machine learning. It prevents overfitting by adding a penalty term to the model’s loss function. This reduces model complexity, ensuring it performs well on new, unseen data. Regularisation enhances the model’s generalisation ability and predictive accuracy by controlling parameter growth.

When should I use L1 regularisation?

Use L1 regularisation when feature selection and sparsity are essential, especially with high-dimensional data. L1 reduces some coefficients to zero, simplifying the model and highlighting critical features. It effectively eliminates irrelevant or redundant features, thus improving model interpretability and performance.

Conclusion

L1 and L2 Regularisation are the two approaches in Machine Learning that prevent overfitting in the ML models. The above post makes it clear that the two different types of methods help reduce complexities in the model and make improvements for better inputs. 

Indeed, L2 and L1 Regularisation in Machine Learning enable it to work with high-dimensional data and keep the model from getting into complications. The differences prove that L1 Regularisation is more sensitive to outliers than L2 Regularisation, which is quite robust. 

Authors

  • Asmita Kar

    Written by:

    Reviewed by:

    I am a Senior Content Writer working with Pickl.AI. I am a passionate writer, an ardent learner and a dedicated individual. With around 3years of experience in writing, I have developed the knack of using words with a creative flow. Writing motivates me to conduct research and inspires me to intertwine words that are able to lure my audience in reading my work. My biggest motivation in life is my mother who constantly pushes me to do better in life. Apart from writing, Indian Mythology is my area of passion about which I am constantly on the path of learning more.