- Getting started
- Introduction to the Wiki
- Overview of topics
- How to contribute
- General best practices
- Key principles of Computer Vision
- Convolution
- Advanced convolution techniques and layers
- Pooling
- Overfitting
- Underfitting
- Overfitting Vs. Underfitting in Machine Learning
- Upsampling and Downsampling techniques in Machine Learning
- Computer Vision tasks
- The complete glossary of the modern Computer Vision tasks
- Classification / Tagging
- Object Detection
- Semantic Segmentation
- Instance Segmentation
- Panoptic Segmentation
- Attribute Prediction
- Computer Vision model architectures
- ResNet
- Faster R-CNN
- Mask R-CNN
- DeepLabv3+
- U-Net
- FBNetV3
- U-Net++
- Efficient Net
- PAN
- PSPNet
- LinkNet
- FPN
- RetinaNet
- Cascade R-CNN
- FBNetV3IS
- FBNetV3OD
- CascadeMask R-CNN
- HybridTask Cascade
- Computer Vision metrics
- Confusion Matrix
- Intersection over Union (IoU)
- Accuracy
- Hamming score
- Precision
- Recall
- Precision-Recall curve and AUC-PR
- F-score
- Average Precision
- mean Average Precision (mAP)
- Loss functions in Machine Learning
- Comprehensive overview of loss functions in Machine Learning
- Cross-Entropy Loss
- Binary Cross-Entropy Loss
- Focal loss
- Bounding Box Regression Loss
- CrossEntropyIoULoss2D
- Average Loss
- Solver / Optimizer
- Comprehensive overview of solvers/optimizers in Deep Learning
- Adam
- SGD
- Adadelta
- Adagrad
- AdaMax
- Adamw
- ASGD
- Rprop
- RMSprop
- Lion
- Weight Decay
- Base Learning Rate
- Momentum (SGD)
- Epsilon Coefficient
- Training Parameters
- Patience
- Min delta
- Seed
- Everything you need to know about batches in Machine Learning
- Iterations
- Epoch
- Scheduler
- Comprehensive overview of learning rate schedulers in Machine Learning
- ExponentialLR
- CyclicLR
- StepLR
- MultiStepLR
- ReduceLROnPlateau
- CosineAnnealingLR
- Computer Vision augmentations
- Comprehensive overview of augmentations in Machine Learning
- Horizontal Flip
- Vertical Flip
- Random Crop
- Random Sized Crop
- Rotate
- Resize
- Blur
- Smallest max size
- Center Crop
- Color Jitter
- Gaussian Noise
- Shift Scale Rotate
- Longest max size
- Equalize
- To gray
- Shear
- Mosaic
- Copy Paste
- Extrapolation methods
- Interpolation methods
- Deployment
- Primitive deployment using web frameworks
- Commonly used web frameworks
- Containerized Deployment
- Orchestrated Deployment
- Challenges of Deployment
- Splits
- Data Splitting in Machine Learning

- Getting started
- Introduction to the Wiki
- Overview of topics
- How to contribute
- General best practices
- Key principles of Computer Vision
- Convolution
- Advanced convolution techniques and layers
- Pooling
- Overfitting
- Underfitting
- Overfitting Vs. Underfitting in Machine Learning
- Upsampling and Downsampling techniques in Machine Learning
- Computer Vision tasks
- The complete glossary of the modern Computer Vision tasks
- Classification / Tagging
- Object Detection
- Semantic Segmentation
- Instance Segmentation
- Panoptic Segmentation
- Attribute Prediction
- Computer Vision model architectures
- ResNet
- Faster R-CNN
- Mask R-CNN
- DeepLabv3+
- U-Net
- FBNetV3
- U-Net++
- Efficient Net
- PAN
- PSPNet
- LinkNet
- FPN
- RetinaNet
- Cascade R-CNN
- FBNetV3IS
- FBNetV3OD
- CascadeMask R-CNN
- HybridTask Cascade
- Computer Vision metrics
- Confusion Matrix
- Intersection over Union (IoU)
- Accuracy
- Hamming score
- Precision
- Recall
- Precision-Recall curve and AUC-PR
- F-score
- Average Precision
- mean Average Precision (mAP)
- Loss functions in Machine Learning
- Comprehensive overview of loss functions in Machine Learning
- Cross-Entropy Loss
- Binary Cross-Entropy Loss
- Focal loss
- Bounding Box Regression Loss
- CrossEntropyIoULoss2D
- Average Loss
- Solver / Optimizer
- Comprehensive overview of solvers/optimizers in Deep Learning
- Adam
- SGD
- Adadelta
- Adagrad
- AdaMax
- Adamw
- ASGD
- Rprop
- RMSprop
- Lion
- Weight Decay
- Base Learning Rate
- Momentum (SGD)
- Epsilon Coefficient
- Training Parameters
- Patience
- Min delta
- Seed
- Everything you need to know about batches in Machine Learning
- Iterations
- Epoch
- Scheduler
- Comprehensive overview of learning rate schedulers in Machine Learning
- ExponentialLR
- CyclicLR
- StepLR
- MultiStepLR
- ReduceLROnPlateau
- CosineAnnealingLR
- Computer Vision augmentations
- Comprehensive overview of augmentations in Machine Learning
- Horizontal Flip
- Vertical Flip
- Random Crop
- Random Sized Crop
- Rotate
- Resize
- Blur
- Smallest max size
- Center Crop
- Color Jitter
- Gaussian Noise
- Shift Scale Rotate
- Longest max size
- Equalize
- To gray
- Shear
- Mosaic
- Copy Paste
- Extrapolation methods
- Interpolation methods
- Deployment
- Primitive deployment using web frameworks
- Commonly used web frameworks
- Containerized Deployment
- Orchestrated Deployment
- Challenges of Deployment
- Splits
- Data Splitting in Machine Learning

When building a Machine Learning solution, you might end up with a model that shows poor results both on training and validation. Data Scientists face such a challenge occasionally and call it underfitting.

On this page, we will:

Define the underfitting term;

Explore the bias-variance tradeoff in Machine Learning;

Come up with a simple underfitting example;

Understand the potential reasons behind a model underfitting;

Learn how to detect underfitting early on;

And explore 5 ways of preventing and overcoming underfitting issues.

Let’s jump in.

To define the term, underfitting is such a Machine Learning model behavior when the model is too simple to grasp the general patterns in the training data, resulting in poor training and validation performance. In other words, you can think of an underfitted model as "too naive" to understand the complexities and connections of the data.

Underfitting is not desirable model behavior, as an underfitted model is useless and cannot be used anywhere other than serving as a case in point, undermining the whole training point.

Let’s take a look at underfitting on a deeper level.

The key to understanding underfitting lies in the bias-variance tradeoff concept. As you might know, when training an ML algorithm, developers minimize its loss, which can be decomposed into three parts: noise (sigma), bias, and variance.

Let’s get through them one by one:

The first component describes the

**noise**in the data and is equal to the error of the ideal algorithm. There will always be noise in the data because of the shift from the training samples to real-world data. Therefore, it is impossible to construct an algorithm with less error;The second component is the bias of the model.

**Bias**is the deviation of the average output of the trained algorithm from the prediction of the ideal algorithm;The third component is the variance of the model.

**Variance**is the scatter of the predictions of the trained algorithm relative to the average prediction.

The bias shows how well you can approximate the ideal model using the current algorithm. The bias is generally low for complex models like trees, whereas the bias is significant for simple models like linear classifiers. The variance indicates the degree of prediction fluctuation the trained algorithm might have depending on the data it was trained on. In other words, the variance characterizes the sensitivity of an algorithm to changes in the data. As a rule, simple models have a low variance,

and complex algorithms - a high one.

The picture above shows models with different biases and variances. A blue dot represents each model, so one dot corresponds to one model trained on one of the possible training sets. Each circle characterizes the quality of the model - the closer to the center, the fewer the model's error on the test set.

As you can see, having a high bias means that the model's predictions will be far from the center, which is logical given the bias definition. With variance, it is trickier as a model can fall both relatively close to the center as well as in an area with large error.

Bias and variance have an inverse relation: when bias is high, variance is low, and vice versa. This is well reflected in the image below.

Thus, underfitting is such a scenario when the bias is so high that the model almost does not make any correct predictions, and the variance is so low that the model predicts samples very close to the average value.

Let’s draw a simple underfitting example.

Imagine having the data with the parabolic dependence. The goal of your model is to learn to predict this relationship, but for some reason, you are using a model that can restore only linear dependencies. In such a case, you will get an underfitted model, as you can not force a linear model to learn how to predict non-linear dependencies between input and output.

Underfitting reasons may vary from use case to use case. However, in general, you might want to check the following points.

If your dataset is noisy, contains plenty of outliers, or is preprocessed with mistakes, it might massively confuse the model and lead to underfitting, as the model will not have a chance to capture the general patterns in the data.

As shown above, a model will likely underfit if it is too basic for the complexity of the task or data. An example is training a linear regression model on a dataset with a high nonlinear relationship between the features and the target.

When exploring other pages on the related topic, you might find other underfitting reasons highlighted. Still, from our experience, those mentioned above are the most common occurrences, and we will explain why below.

Underfitting is not something as easy to capture as overfitting. Technically, each model that is not 100% trained can be considered underfitted. But who decides that the model is well-trained? The answer is a Data Scientist who is just a human being that can make mistakes.

Therefore, below, we will mainly discuss the hardcore underfitting case when a model struggles massively to capture data patterns. There are different approaches to detecting an underfitting model. You can do it by checking the learning curves, empirically, or through cross-validation. Let’s check these methods one by one.

To diagnose the underfit, you can take a look at the model’s learning curves - the plot that reflects the model’s loss on the train and test data over iterations.

If the curves show a descending trend, your model is training. At some point, you might see that the training loss starts approaching zero while the validation loss suddenly rises. This is precisely where the model stopped extracting the general pattern (got fit) and started overfitting. So, the “sweet spot” to stop the training is right before the validation curve starts ascending.

Therefore, you should not aspire to 100% accuracy on a train set since it will almost guarantee getting an overfitted model. Instead, look for improving your model’s accuracy on a validation and test set.

An empiric way of detecting underfitting is by evaluating the Machine Learning performance on train and test steps. You will see poor performance in both stages with no visible life signs.

The most accurate approach to detecting underfitting (and other model weaknesses) is **k-fold cross-validation**. The algorithm is the following:

Shuffle your dataset and split it into

**k**equal-sized folds;Train your model on

**k - 1**folds and test its performance on the left-out fold;Repeat the procedure

**k**times so that each fold is used as a validation set once;Take the average across the model’s performance on all the folds and analyze the obtained value.

Please remember that underfitting is not something you can accurately measure. If the model consistently performs poorly on various subsets of the data, it may indicate underfitting. The answer is usually yes or no, as no one can say to which degree the model is underfitted. Technically, every model “fits” the training data, extracting valuable patterns from its features. But not every model “underfits”. There is a thin line between these conditions, so you should research every potential underfit case in a vacuum, knowing what is at stake.

Underfitting is not as hot and vital as overfitting. Still, there are some valuable techniques to prevent and overcome underfitting in Machine Learning models and neural networks.

Some of these approaches are complex, so on this page, we will only draw a brief description of each method and leave links to more in-depth pages exploring a specific way.

For now, the most common techniques for dealing with underfitting are:

**Increasing the model complexity**. As mentioned above, a model might underfit because it is too simple. You can make it more complex (complicate its architecture or pick a more complex basic model) to try and overcome this;**Bringing more data**. Introducing more training data can sometimes help, as it can expose the model to a broader range of patterns and relationships. However, this might not always be feasible or effective;**Decreasing the regularization strength**. Regularization introduces additional terms in the loss function that punish a model for having high weights. Such an approach reduces the impact of individual features and forces an algorithm to learn more general trends. Besides traditional regularization techniques such as L1 and L2, in neural networks, weight decay and adding noise to inputs can also be applied for regularization purposes. In general, regularization is introduced to prevent overfitting, so to try and overcome underfitting, you should decrease the regularization strength;**More training time.**This point is pretty much self-explanatory. You should give your model a bit more time to train and extract patterns while maintaining the balance between under- and overfitting**;****Accurate preprocessing**. As mentioned above, underfitting might occur because of a dirty dataset. Therefore, try a precise, organized, comprehensive preprocessing, including feature selection, feature engineering, data cleaning, and handling outliers. This might have a significant effect on the result.

These approaches provide a wide range of techniques to address underfitting issues and ensure better generalization capabilities of a model. The exact choice of a method depends significantly on the use case, data, model, goals, etc. Please explore the field before opting for a certain way.

Underfitting is such a Machine Learning model behavior when the model fails to capture the patterns in the data, showing poor performance in the training and test stages.

The most accurate approach to detecting underfitting is **k-fold cross-validation**.

The general advice we can give you is to remember that underfitting exists, but techniques for overcoming it are there as well. Follow the simple five steps listed below to ease your life from underfitting when developing a Machine Learning solution:

Try to collect as diverse, extensive, and balanced a dataset as possible;

Keep track of the model learning curves to find the balance between underfitting and overfitting;

Do not use too complex or too simple models without the need for that;

Keep an eye on the strength of your regularization techniques;

Always validate your model performance on a set of examples not seen during training (for instance, using cross-validation).

These steps will not guarantee getting rid of underfitting for good. However, it is your responsibility and interest to make your model as reliable, robust, and generalizable as possible.

Last modified 9d ago

© 2010-2024 CloudFactory Limited. All rights reserved.