Lets assume f is our NN. Individual data points are (x, y) and batch data is (X, Y)
- y = f(x), then I’ll take gradient of y with respect to every parameter of NN, this I’ll represent as g.
- Y = f(X), this is vectorization. Now if I take gradient of Y with respect to params of NN, then I’ll get G.
What is the relation between G and g? Whether G is average of all g in that batch or something else.
For the context, I am facing this difficulty while implement the policy gradient (reinforcement learning) algorithm. In the policy gradient we have to average over some of the gradients of the policy function. The confusion is that should I do that for individual states or should I use batch of states, because for both the cases, the gradients are of same dimensions.
submitted by /u/Better-Ad8608
[visit reddit] [comments]