Walkthrough : Variational autoencoder (VAE)

What's a VAE ?

  • if this is too long to read, these slides explain the key points

'Classic' autoencoders

classic autoencoder

  • loss function ||xy||2 : want to reconstruct the original input
  • z is a compact, low dimensional (pn) representation of input x
  • bottleneck forces the network to learn how to represent the training set X={x1,xN}

Applications:

  • denoising, completion
  • discriminant feature learning to feed some classifier
  • unsupervised training of individual layers of large convnets
  • manifold learning, dimensionality reduction

pca versus ae Comparison of separability of 2-dimensional codes generated by an autoencoder (right) and PCA (left) on the MNIST dataset

Variational autoencoder

  • It's a generative model : given a dataset X, generate new samples like those in X but not equal to anyone
  • Learns the parameters of an approximation of the underlying probability distribution p(X) so as to

    • draw new samples
    • compute the probability of a new sample
  • Only that xiX are very high dimensional! e.g. 728 dimensions for MNIST
  • Resemblance with AE is just the network architecture (though it's not exactly the same, see below)

vae

Encoder : qφ(z|x)

Decoder : pθ(x|z)

Loss function

It can be seen (but thats the difficult part, see slides for an explanation) that in order to maximize the likelihood of the training set P(X|φ,θ) the loss is the sum of

  • Kullback Leibler divergence between the distribution in latent space induced by the encoder on the data, qφ(z|x) , and some prior selected for z, p(z) like N(0,1). At the end, this simplifies to i=1p1+log(σzi2)μziσzi2
  • a reconstruction error: i=1nlogp(xi|z), if reconstruction was perfect, i.e. z produces xi always, log(1)

In the code, they assume p(xi|z) is a Bernoulli i=1nlog(μyi)+(1xi)log(1μyi).

For p(xi|z) Gaussian N(μxi,σxi), this error would be i=1nlog(σxi2)+(xiμxi)2/σxi2

Implementations

A simple implementation, see #11. Need to download everything for this to run.

A better but longer implementation, that we have adapted