Overfitting Vs. Underfitting

Overfitting and underfitting are well-known in the Data Science community as common pitfalls developers face when building Machine Learning solutions. Knowing your enemy before rushing into the building process is essential, so on this page, we will:

Define the overfitting and underfitting terms;
Explore the bias-variance tradeoff in Machine Learning and understand the basics of the poor model behavior;
And find out how to prevent and overcome underfitting and overfitting.

This is a derivative piece from the general Overfitting and Underfitting pages. Please check them out to get a more comprehensive grasp of the topics.

What is Overfitting?

Overfitting is such a Machine Learning model behavior when the model is very successful in training but fails to generalize predictions to the new, unseen data. In other words, the model shows a high ML metric during training, but in production, the metric is significantly lower.

Overfitting is not a desirable model behavior as an overfitted model is not robust or trustworthy in a real-world setting, undermining the whole training point.

What is Underfitting?

To define the term, underfitting is such a Machine Learning model behavior when the model is too simple to grasp the general patterns in the training data, resulting in poor training and validation performance. In other words, you can think of an underfitted model as "too naive" to understand the complexities and connections of the data.

Underfitting is not desirable model behavior, as an underfitted model is useless and cannot be used anywhere other than serving as a case in point, undermining the whole training point.

Bias-Variance tradeoff

Let’s take a look at underfitting and overfitting on a deeper level.

The key to understanding overfitting and underfitting lies in the bias-variance tradeoff concept. As you might know, when training an ML algorithm, developers minimize its loss, which can be decomposed into three parts: noise (sigma), bias, and variance.

Formular bias-variance tradeoff decomposition
Source

Let’s get through them one by one:

The first component describes the noise in the data and is equal to the error of the ideal algorithm. There will always be noise in the data because of the shift from the training samples to real-world data. Therefore, it is impossible to construct an algorithm with less error;
The second component is the bias of the model. Bias is the deviation of the average output of the trained algorithm from the prediction of the ideal algorithm;
The third component is the variance of the model. Variance is the scatter of the predictions of the trained algorithm relative to the average prediction.

The bias shows how well you can approximate the ideal model using the current algorithm. The bias is generally low for complex models like trees, whereas the bias is significant for simple models like linear classifiers. The variance indicates the degree of prediction fluctuation the trained algorithm might have depending on the data it was trained on. In other words, the variance characterizes the sensitivity of an algorithm to changes in the data. As a rule, simple models have a low variance,
and complex algorithms - a high one.

The bias-variance tradeoff visualized
Source

The picture above shows models with different biases and variances. A blue dot represents each model, so one dot corresponds to one model trained on one of the possible training sets. Each circle characterizes the quality of the model - the closer to the center, the fewer the model's error on the test set.

As you can see, having a high bias means that the model's predictions will be far from the center, which is logical given the bias definition. With variance, it is trickier as a model can fall both relatively close to the center as well as in an area with large error.

Bias and variance have an inverse relation: when bias is high, variance is low, and vice versa. This is well reflected in the image below.

Overfitting Vs. Underfitting visualized
Source

Thus, underfitting is such a scenario when the bias is so high that the model almost does not make any correct predictions, and the variance is so low that the model predicts samples very close to the average value.

On the contrary, overfitting is such a scenario when the bias is so low that the model almost makes no mistakes, and the variance is so high that the model can predict samples far from the average.

How to prevent and overcome overfitting?

Overfitting is a sworn enemy of every Data Scientist, but over time, developers have come up with valuable techniques to prevent and overcome overfitting in Machine Learning models and neural networks.

Some of these approaches are complex, so on this page, we will only draw a brief description of each method and leave links to more in-depth pages exploring a specific way.

For now, the most common techniques for dealing with overfitting are:

Regularization techniques. Regularization introduces additional terms in the loss function that punish a model for having high weights. Such an approach reduces the impact of individual features and forces an algorithm to learn more general trends. Besides traditional regularization techniques such as L1 and L2, in neural networks, weight decay and adding noise to inputs can also be applied for regularization purposes;
Early stopping. Early stopping is a way that closely monitors the validation loss curve while training and stops the training process as soon as the validation loss stops improving. Such a method protects the model and does not allow it to learn noise in the data, thus making it less complex;
Dropout. Dropout is one of the primary approaches to dealing with overfitting in neural networks. This layer randomly deactivates some neurons during each training iteration, creating an ensemble of smaller networks. As a result, the network learns not to rely on specific neurons but rather to pick up on generalized patterns;
Data Augmentation. Data Augmentation suggests prompting the model to be more robust to noise by extending the training set with augmented training samples. For example, you can rotate existing images, change their color palette, or apply various other transformations. This is especially something to keep your eye on if you have a small or not very diverse dataset;
Batch Normalization. Batch Normalization brings normalized activations within each neural network layer on training. Although this technique was not initially intended to prevent overfitting, it still has a nice regularization side-effect, making training more stable and mitigating overfitting;
Reducing the complexity. As mentioned above, a model might overfit because it is too complex. You can simplify its architecture to try and overcome this;
Ensembling. Ensembling suggests combining predictions of several models to formulate the final output. It is proven to help improve the solution’s generalization capabilities and reduce the overfitting risks. By the way, by applying Dropout, you use the ensembling concept to the neural network;
And many more. Plenty of less popular methods still might be the perfect fit for your case. For example, you can exclude irrelevant features from your data - Feature Selection, collect more diverse data to address edge cases or class imbalance, work with extensive model’s nodes, connections, and parameters after training - Pruning, or take an alternative approach and run numerous experiments with various hyperparameters to find the balance between the model complexity and generalization.

These approaches provide a wide range of techniques to address overfitting issues and ensure better generalization capabilities of a model. The exact choice of a method depends significantly on the use case, data, model, goals, etc. Please explore the field before opting for a certain way.

How to prevent and overcome underfitting?

Underfitting is not as hot and vital as overfitting. Still, there are some valuable techniques to prevent and overcome underfitting in Machine Learning models and neural networks.

Some of these approaches are complex, so on this page, we will only draw a brief description of each method and leave links to more in-depth pages exploring a specific way.

For now, the most common techniques for dealing with underfitting are:

Increasing the model complexity. As mentioned above, a model might underfit because it is too simple. You can make it more complex (complicate its architecture or pick a more complex basic model) to try and overcome this;
Bringing more data. Introducing more training data can sometimes help, as it can expose the model to a broader range of patterns and relationships. However, this might not always be feasible or effective;
Decreasing the regularization strength. Regularization introduces additional terms in the loss function that punish a model for having high weights. Such an approach reduces the impact of individual features and forces an algorithm to learn more general trends. Besides traditional regularization techniques such as L1 and L2, in neural networks, weight decay and adding noise to inputs can also be applied for regularization purposes. In general, regularization is introduced to prevent overfitting, so to try and overcome underfitting, you should decrease the regularization strength;
More training time. This point is pretty much self-explanatory. You should give your model a bit more time to train and extract patterns while maintaining the balance between under- and overfitting;
Accurate preprocessing. As mentioned above, underfitting might occur because of a dirty dataset. Therefore, try a precise, organized, comprehensive preprocessing, including feature selection, feature engineering, data cleaning, and handling outliers. This might have a significant effect on the result.

These approaches provide a wide range of techniques to address underfitting issues and ensure better generalization capabilities of a model. The exact choice of a method depends significantly on the use case, data, model, goals, etc. Please explore the field before opting for a certain way.

Key Takeaways: Overfitting Vs. Underfitting

Overfitting is such a Machine Learning model behavior when the model successfully trains but fails to generalize predictions to the new, unseen data.

Underfitting is such a Machine Learning model behavior when the model fails to capture the patterns in the data, showing poor performance in the training and test stages.

The most accurate approach to detecting overfitting and underfitting is k-fold cross-validation.

Please follow the simple five steps listed below to ease your life from overfitting when developing a Machine Learning solution:

Try to collect as diverse, extensive, and balanced a dataset as possible;
Keep track of the model learning curves to detect overfitting early on;
Do not use too complex models without the need for that;
Apply regularization techniques;
Always validate your model performance on a set of examples not seen during training (for instance, using cross-validation).

However, if you are struggling with underfitting, you might want to try something from the list below:

Try to collect as diverse, extensive, and balanced a dataset as possible;
Keep track of the model learning curves to find the balance between underfitting and overfitting;
Do not use too complex or too simple models without the need for that;
Keep an eye on the strength of your regularization techniques;
Always validate your model performance on a set of examples not seen during training (for instance, using cross-validation).

Learn more about key principles of Machine Learning and Computer Vision

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 11mo ago

Previous - Key principles of Computer Vision

Underfitting

Next - Key principles of Computer Vision

Downsampling and Upsampling in Machine Learning