For example I can have the last layer of my network be tf.keras.layers.Dense(10) even if my labels are just single values, so the y is shape (500,)
The last layer outputs 10 values but there is only one label per input vector.
How can this work? Which of the 10 values is compared to the label to calculate the loss? I expected keras to give me an error in this case but it trains just fine and even does well on the toy dataset I’m using.