Batch Size
The batch size specifies the number of samples that propagate through the neural network before updating the model parameters. Each batch of samples goes through one full forward and one full backward propagation.
Intuition
Suppose you have 3000 training samples and set the batch size to 128. The algorithm trains the network using the first 128 samples from the training dataset. The network is then trained again using the second 128 samples. It will repeat this process until all samples have been propagated across the network.
When the problem arises that the final batch has fewer samples than the other batches, you can either skip these samples or create a smaller last batch.
How to choose the optimal batch size?
Finding the right batch size for your use case depends highly on the number of images you use for training and the diversity in your data.
Depending on your hardware (RAM + GPU), you could be limited to small batch sizes.
Smaller batches mean that each step in a gradient descent (optimizer) may be less accurate, so the algorithm might take longer to converge.
However, it has been learned that the model's quality, as determined by its generalization ability, suffers significantly for larger batches. If you fit the entire data and update the weights, it may work well for training data but not for other data.
It is generally good practice to increase the batch size until you saturate your GPU consumption. However, like everything, the batch size is merely a hyperparameter.