Comparison of separability of 2-dimensional codes generated by an autoencoder (right) and PCA (left) on the MNIST dataset
Learns the parameters of an approximation of the underlying probability distribution so as to
It can be seen (but thats the difficult part, see slides for an explanation) that in order to maximize the likelihood of the training set the loss is the sum of
In the code, they assume is a Bernoulli .
For Gaussian , this error would be