I am reading the advanced tutorials of TF 2.4, and I am confused
about the need to use two instances of GradientTape. This is the
case in the Pix2Pix
Deep Convolutional GAN examples, while the
CycleGAN example uses a singe, persistent GradientTape.
It seems to me that the first approach makes both GradientTapes
record the operations of both networks, which sounds wasteful.
Intuitively, the second approach makes way more sense to me, should
use half as much memory as the first.
When should one use the first and the second approaches?