Hello there,

I am currently working on a VAE using

tensorflow-probability. I would like to later train it on celeb_a,

but right now I am using mnist to test everything.

My model looks like this, inspired by

this example “` prior =

tfd.Independent(tfd.Normal(loc=tf.zeros(encoded_size), scale=1),

reinterpreted_batch_ndims=1)

inputs = tfk.Input(shape=input_shape) x = tfkl.Lambda(lambda x:

tf.cast(x, tf.float32) – 0.5)(inputs) x = tfkl.Conv2D(base_depth,

5, strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2D(base_depth, 5, strides=2, padding=’same’,

activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(2 * base_depth, 5,

strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2D(2 * base_depth, 5, strides=2, padding=’same’,

activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(4 * encoded_size,

7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =

tfkl.Flatten()(x) x =

tfkl.Dense(tfpl.IndependentNormal.params_size(encoded_size))(x) x =

tfpl.IndependentNormal(encoded_size,

activity_regularizer=tfpl.KLDivergenceRegularizer(prior))(x)

encoder = tfk.Model(inputs, x, name=’encoder’)

encoder.summary()

inputs = tfk.Input(shape=(encoded_size,)) x = tfkl.Reshape([1,

1, encoded_size])(inputs) x = tfkl.Conv2DTranspose(2 * base_depth,

7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2DTranspose(2 * base_depth, 5, strides=1, padding=’same’,

activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2DTranspose(2 *

base_depth, 5, strides=2, padding=’same’,

activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,

activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2DTranspose(base_depth, 5, strides=2, padding=’same’,

activation=tf.nn.leaky_relu)(x) x =

tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,

activation=tf.nn.leaky_relu)(x) mu = tfkl.Conv2D(filters=1,

kernel_size=5, strides=1, padding=’same’, activation=None)(x) mu =

tfkl.Flatten()(mu) sigma = tfkl.Conv2D(filters=1, kernel_size=5,

strides=1, padding=’same’, activation=None)(x) sigma =

tf.exp(sigma) sigma = tfkl.Flatten()(sigma) x = tf.concat((mu,

sigma), axis=1) x = tfkl.LeakyReLU()(x) x =

tfpl.IndependentNormal(input_shape)(x)

decoder = tfk.Model(inputs, x) decoder.summary()

negloglik = lambda x, rv_x: -rv_x.log_prob(x)

vae.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-4),

loss=negloglik)

mnist_digits are normed between 0.0 and 1.0

history = vae.fit(mnist_digits, mnist_digits, epochs=100,

batch_size=300) “`

My problem here is that the loss function stops decreasing at

around ~470 and the images sampled from the returned distribution

look like random noise. When using a bernoulli distribution instead

of the normal distribution in the decoder, the loss steadily

decrease and the sampled images look like they should. I can’t use

a bernoulli distribution for rgb tho, which I have to when I want

to train the model on celeb_a. I also can’t just use a

deterministic decoder, as I want to later decompose the elbo (loss

term – KL divergence) as seen in this.

Can someone explain to me why the normal distribution just

“doesn’t work”? How can I improve it so that it actually learns a

distribution that I can sample.

submitted by /u/tadachs

[visit reddit]

[comments]