Interactive demo

Boltzmann Machines

A Restricted Boltzmann Machine is a 2-layer energy-based model: visible units that hold the data, hidden units that learn features, no connections within a layer. Below the network trains live on tiny digit patterns via contrastive divergence, then you can paint corruption on a digit and watch Gibbs sampling clean it up.

Phase 1 · Learning

Train an RBM with contrastive divergence

The model's joint energy is with conditionals and . Training pushes weights so the data is low-energy and "fantasies" sampled from the model are high-energy: . Below, 32 hidden units learn 64-pixel features (8×8 digits). Each square in the filter grid is one hidden unit's weight pattern; green is excitatory, red is inhibitory. Watch random noise resolve into stroke detectors over a few hundred CD-1 steps.

Controls

Learning rate 0.05

Hidden units 32

Training progress

epoch 0

batch 0

recon err --

Phase 2 · Inference

Corrupt a digit, then Gibbs-sample it back

Pick a target, paint over pixels to corrupt it, then iterate and . Default is anchored inference: pixels not flipped by the slider are observed and clamped to their true value at each step; the chain only fills in the unknown holes. This survives very high corruption because real evidence keeps it on track. Free mean-field drops the clamps and lets the whole image relax, which collapses toward the prior at high corruption. Stochastic Gibbs samples instead of propagating probabilities, useful for generating but not for denoising.

Original (click a target)

Corrupted (paint to flip)

Reconstruction

Pick a target digit

Inference mode

Controls

Initial corruption 35%

Gibbs chain

step 0

free E --

recovered --

Sources and reading

Hinton, G., Sejnowski, T. Learning and relearning in Boltzmann machines, in Parallel Distributed Processing, MIT Press, 1986.
Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory, 1986. (The Restricted Boltzmann Machine, originally "harmonium".)
Hinton, G. Training products of experts by minimizing contrastive divergence, Neural Computation 14, 2002. PDF
Hinton, G., Salakhutdinov, R. Reducing the dimensionality of data with neural networks, Science 313, 2006. PDF