Machine Learning (CPSC 540): Markov Chain Monte Carlo

I'm going to explain MCMC with finite case. Essentially, we don't need MCMC method for finite case but this would be good example for understanding this method. Before this, I recommend reading http://blog.kleinproject.org/?p=280 which is a good introduction for Markov Chain.

Suppose that we want to sample from following distribution.
We will sample by exploring the state space $X=\left\{1, 2, 3\right\}$ using a Markov Chain mechanism.
Let $T$ be a transition matrix which is aperiodic and irreducible. Then, for any initial probability vector $v$, $vT^t \to \pi \ \textit{as} \ t \to \infty$. Now, if $\pi = \left(\frac{1}{6}, \frac{1}{3}, \frac{1}{2} \right)$, we can sample from object distribution by taking any distribution of initial state and traverse state space with $T$ until convergence. Thus, all we have to do is finding $T$ which is aperiodic, irreducible, and $\pi T = \pi$.
This exactly agrees with continuous case which is our aim. In this case, $T$ becomes $K(y \mid x)$, $vT$ becomes $\int v(x) K(y \mid x) \textit{dx}$ and $\pi$ becomes $\pi(x)$. Thus, if we find $K(y \mid x)$ which is aperiodic, irreducible, and $\pi(x_t) K(x_{t+1} \mid x_t) = \pi(x_{t+1}) K(x_t \mid x_{t+1})$ which is called detailed balance and means ergodic behavior $\int \pi(x_t) K(x_{t+1} \mid x_t) \textit{d}x_t = \int  \pi(x_{t+1}) K(x_t \mid x_{t+1}) \textit{d}x_t = \pi(x_{t+1})$, we can sample from object distribution $\pi$ by using Markov Chain mechanism. The important thing and the main reason that we need this method is, we can apply this even if $\pi$ isn't normalized.
Then, how can we find such $K$? There are brilliant algorithms and one of them is Metropolis-Hastings which is introduced in the lecture. Personally, I think MCMC is a very amazing idea.

댓글