Hi all,
I have been using the GradientTape method to loop through individual samples due to a very large batch size for a project I am working on.
I generated a much smaller batch to make sure my loop was producing the right total loss when compared to the GradientTape method applied directly to that batch..
I found that the method actually take the mean of the losses as opposed to the sum of losses before it is passed on the gradient method.
Why is the mean of losses used when calculating gradients instead of the sum of individual sample losses?
Thank you!
submitted by /u/amjass12
[visit reddit] [comments]