Categories
Misc

[Question] Sampling Images from a normal distribution for VAE

Hello there,

I am currently working on a VAE using
tensorflow-probability. I would like to later train it on celeb_a,
but right now I am using mnist to test everything.

My model looks like this, inspired by
this example
“` prior =
tfd.Independent(tfd.Normal(loc=tf.zeros(encoded_size), scale=1),
reinterpreted_batch_ndims=1)

inputs = tfk.Input(shape=input_shape) x = tfkl.Lambda(lambda x:
tf.cast(x, tf.float32) – 0.5)(inputs) x = tfkl.Conv2D(base_depth,
5, strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2D(base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(2 * base_depth, 5,
strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2D(2 * base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(4 * encoded_size,
7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Flatten()(x) x =
tfkl.Dense(tfpl.IndependentNormal.params_size(encoded_size))(x) x =
tfpl.IndependentNormal(encoded_size,
activity_regularizer=tfpl.KLDivergenceRegularizer(prior))(x)

encoder = tfk.Model(inputs, x, name=’encoder’)
encoder.summary()

inputs = tfk.Input(shape=(encoded_size,)) x = tfkl.Reshape([1,
1, encoded_size])(inputs) x = tfkl.Conv2DTranspose(2 * base_depth,
7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(2 * base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2DTranspose(2 *
base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) mu = tfkl.Conv2D(filters=1,
kernel_size=5, strides=1, padding=’same’, activation=None)(x) mu =
tfkl.Flatten()(mu) sigma = tfkl.Conv2D(filters=1, kernel_size=5,
strides=1, padding=’same’, activation=None)(x) sigma =
tf.exp(sigma) sigma = tfkl.Flatten()(sigma) x = tf.concat((mu,
sigma), axis=1) x = tfkl.LeakyReLU()(x) x =
tfpl.IndependentNormal(input_shape)(x)

decoder = tfk.Model(inputs, x) decoder.summary()

negloglik = lambda x, rv_x: -rv_x.log_prob(x)

vae.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-4),
loss=negloglik)

mnist_digits are normed between 0.0 and 1.0

history = vae.fit(mnist_digits, mnist_digits, epochs=100,
batch_size=300) “`

My problem here is that the loss function stops decreasing at
around ~470 and the images sampled from the returned distribution
look like random noise. When using a bernoulli distribution instead
of the normal distribution in the decoder, the loss steadily
decrease and the sampled images look like they should. I can’t use
a bernoulli distribution for rgb tho, which I have to when I want
to train the model on celeb_a. I also can’t just use a
deterministic decoder, as I want to later decompose the elbo (loss
term – KL divergence) as seen in this.

Can someone explain to me why the normal distribution just
“doesn’t work”? How can I improve it so that it actually learns a
distribution that I can sample.

submitted by /u/tadachs

[visit reddit]

[comments]

Leave a Reply

Your email address will not be published.