Suppose I have a simple 4 layered NN. I want to train this to recognize pattern in this data. I have two ways to train this:
- I pass one data sample, calculate the loss, calculate the gradients for that loss with respect to the parameters (weights and biases) of neural network, adjust the parameters of NN and repeat this until loss is minimized.
- I pass batches of data, and do the rest as mentioned above. This is called vectorization.
So my question is does the gradients are averaged out between all the samples in a batch?