Big variation in training times?

I’m doing some testing using Google Cloud AI Platform and have seen some strange variation in training times. As an example, I did a test run that had an average training time of around 3.2 seconds per batch. I repeated it with the exact same hyperparameters and machine type and it took around 2.4 seconds the next time. Is there some explanation for this other than one GPU I’m assigned to being better in some way than another? That doesn’t really make sense either, but I don’t know how else to explain it.

submitted by /u/EdvardDashD
[visit reddit] [comments]

Leave a Reply Cancel reply