The quality and the quantity of your data have a vast influence on your final model's performance. However, a big challenge that Data Scientists often have to overcome is the lack of sufficient training data, as well as the expenses associated with gathering it.

Data augmentation is a Machine Learning technique that helps to expand the existing scarce data and to boost the metrics.

You might consider using augmentations if you want to:

  • Reduce the costs of gathering and labeling data;
  • Improve the performance of your model with the existing data;
  • Add more diversity to your training set.

Data augmentation in Machine Learning is a process of increasing the size and diversity of the dataset by creating new data points from the existing data. In Computer Vision, it can be done by altering the existing images in different ways, for instance, by rotating them or changing their brightness or hue. The neural network will perceive the altered versions as separate images and learn new things from them.

The reasons to use augmentations are the following.

Whenever we collect images of some target objects, we try to be as representative as possible. For example, when we teach the model to recognize dogs in general, we try to include images of dogs of different breeds, colors, and so on in the dataset.

However, in the real world, the targets of our interest might appear in different positions, angles, sizes, and contexts. It is almost impossible or very costly to encompass all the variations in the dataset.

Normal image before data augmentation
Source
If your model was only trained on "classical" dog images (like above), it might run into problems when it sees a dog in a weird position! This image could also use some data augmentation.
Source

In this case, capturing all the possible angles and contexts in which the dogs might appear would be tough. Instead, we can use augmentations to add various alterations to our existing dataset. This could make the model robust towards real-world variations while saving the project budget.

Data augmentations can be useful even if you have a large dataset already.

For instance, imagine you want to build a classifier that distinguishes Bulldogs from Pugs. You gather thousands of images of both breeds. However, your model still does not differentiate them properly. Then you notice that all the Bulldogs in the images look to the left, and all the Pugs are standing with their heads to the right. Thus, the model has picked up on the most obvious yet inessential feature, which is the dog's position in space. Instead, it should have concentrated on more relevant features, such as the dog's color, body proportions, etc.

Bulldogs Vs. Pugs
Source

This problem can be solved with the use of augmentations. A simple Horizontal Flip could be of great help.

Data augmentation (Horizontal Flip) applied to the image
Source
  1. Choose one or several augmentations;
  2. Set the parameters of chosen augmentations and apply them to the training set;
  3. When the augmentations are applied to the images, feed them to your model and start training.

To access the augmentations section in AI Data Platfrom, please:

  1. Open the Project Dashboard;
  2. Click on the Model Playground section;
  3. Select the split and create an experiment;
  4. In the experiment parameters, open the Augmentation Train section;
  5. Press the Add new transformations button and select as many augmentations as you want to apply. You can toggle the augmentations on and off or delete them;
Data augmentation settings
Source

6. Review the results of transformations in the section on the right.

  • Click the first button (arrows) to apply the augmentation over again;
  • Click the second button (dice) to change the displayed image randomly.
Data augmentation preview tab
Source
You can't choose which image to apply the transformation to. However, in most cases, you can set the probability with which a given augmentation will be applied to the images.

Usually, augmentations are applied to the training set only. This is because validation and test sets are used for unbiased estimation of the model's performance, so they should consist of the original unchanged images. Thus, you can check whether your model performs well on real-world data and whether the augmentations helped it to generalize better.

However, you can also augment the images from your validation set. We advise doing so only if you are sure that some variation in the real-world data can be imitated well with augmentations.

Make sure that the augmentations you use during validation are used in the training set, as well.

In general, augmentation use is project-specific. The best advice is to play around and see what better fits your case.

It might be a good idea to use several augmentation techniques at once, such as image cropping, rotation, color and brightness perturbations, etc.

Below are the augmentations that AI Data Platform offers:

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more
Last modified