I need to implement CosineSimilarity myself because i need to work on the individual losses before calculating the batch-wide mean.
I do it like this:
a_n = tf.math.l2_normalize(a, axis=-1)
b_n = tf.math.l2_normalize(b, axis=-1)
d = -tf.math.reduce_sum(a_n * b_n, axis=-1)
# Above is _identical_ to Keras' implementation.
return d, tf.math.reduce_mean(d)
I already compared the output to Keras’ implementation by repeatedly printing
print(tf.math.reduce_sum(tf.math.abs(my_loss - keras_loss)))
However, even though this outputs straight zeros (and never any NaNs), i still encounter NaNs, while with Keras’ implementation i do not. I already tried a higher epsilon in the l2_normalize, or using multiply_no_nan, to no avail.
Update: This comment.