Arguably a newbie question (discussion) about epochs and training.

Imagine we have a neural network, or any network that can be trained iteratively with epochs and consider the two training cases below;

We have x amount of data and we train until we reach 0.1 loss. Let us say this took 100 epochs.
We have greatly increased the size of our data to 100x and we have trained for a greatly reduced number of epochs (let us say 1 epoch) to reach 0.1 loss.

In terms of testing performance and realtime predictions, will there be any significant difference between these two cases even though the loss (let us assume the validation and training accuracies are the same as well.) is the same?

I am feeling the contradiction between training for large epochs with small data and training with small data for lower epochs.

As far as I am concerned, I could not get a transformer network to converge to acceptable metrics without data augmentation, while training for a low number of epochs. I have tried training it for thousands of epochs with limited data, no luck.

In order to keep the argument easily arguable, I assumed every other thing I did not change between two cases did not actually change. Quite obviously increasing the data size to 100x will not result in 0.1 loss with 1 epochs, models do not work linearly like that.

submitted by /u/ege6211
[visit reddit] [comments]

Leave a Reply Cancel reply