Base Learning Rate

The learning rate defines how large the steps of your optimizer are on your loss landscape. The base learning rate defines at which learning rate your optimizer starts before applying any methods like momentum or dampening.

Choosing the right learning rate means balancing the trade-off between reaching the minimum quickly and not making such big steps that you miss it.

A high learning rate speeds up your optimizer but you might miss the Minimum.

A valid approach to finding the right base learning rate is to start at 0.1 and then get smaller by a fraction of 10 with each experiment until the results don't improve anymore, ceteris paribus. But that's no silver bullet, there are many other approaches out there as well.

A learning rate of 0.1 means that the weights in the network are updated by 10% of the estimated weight error each iteration.

Further resources

SGD
Adam
AdamW
Momentum
Dampening
Image source

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 14d ago

Previous - Solver / Optimizer

Weight Decay

Next - Solver / Optimizer

Momentum (SGD)