Everything you need to know about batches in Machine Learning
One of the critical training parameters you might come across developing an AI model is batch size. It might seem like one simple value, but it has profound logic backing it up. Understanding the reasons behind batch processing is vital for an overall clearance of the AI solution development process. Therefore, on this page, we will:
Check out the batch definition in Machine Learning (ML);
Understand how batch correlates with the mini-batch and Online Learning terms;
Find out what does Batch Processing mean;
Understand why developers train models in batches.
Let’s jump in.
Batch explained
To define the term, in Machine Learning, a batch is a subset of data examples from the training set of the original dataset a model sees during an iteration of the training process.
The algorithm for splitting the training set into batches is straightforward:
You define the batch size;
You divide the number of data examples in the training set by the batch size to learn the number of batches;
You form each batch based on some logic;
You can stick to the consecutive approach. With batch size equal to 8, the first 8 images go to batch 1, the second 8 images to batch 2, etc.;
You can select data examples randomly;
You can make each batch stratified class-wise;
And so on.
What is Mini-Batch?
In various expert articles and academic papers, you might encounter such a term as a mini-batch.
In Machine Learning, the mini-batch is a batch that consists of more examples than one but fewer examples than the dataset/training set’s size.
So, if you have 10 images in a dataset, picking each batch size from 2 to 9 (included) will mean that you will form a mini-batch.
What is Online Learning?
In Machine Learning, Online Learning refers to an extreme case of a batch consisting of a single example only.
It might occur when your model is already in production. Imagine getting sequential data and the model's performance on real-life examples and wanting to utilize it to update your model. In such a case, having a batch size equal to one is one of the solutions.
What is Batch Processing?
In Machine Learning, Batch Processing is a technique of using batches to process large volumes of data. Instead of feeding all the training set to a model at once, you split the data into batches and perform a sequence of unified jobs of consecutively training the model on one batch after another.
Why are models trained in batches?
The logic behind introducing batches is straightforward.
When preparing to train an ML model, you have two basic approaches to follow:
You feed all the data to the model at once;
You feed some part of the original data to the model, wait until the algorithm processes it, feed the model another part of the data, wait, and so on.
Both approaches are viable, but there are solid disadvantages in the refusal to split the data into batches. If you feed all the data to the model at once, you must simultaneously store every single data asset in the machine’s memory. Also, you must store all the derivative values associated with the data pieces, such as loss values, processing details, etc. Moreover, you can update the model’s weight only after the whole dataset is processed.
These make processing a complete dataset significantly memory-inefficient, time-consuming, and computationally expensive. On the contrary, Batch Processing looks more appealing.
The benefits of Batch Processing are:
Efficient memory utilization;
Improvements in training speed (especially when running jobs in parallel on GPUs);
Regular updates of the weights (after each batch is processed);
The introduction of noise into the training process. This might bring the regularization effect and improve the model’s generalization.