Maximum a Posteriori Estimation (MAP)

Generally, MAP is a particular extension of MLE where we take in account a biased assumption of data, called prior knowledge. Specifically Bayes’ theorem (or Bayes’ rule) allows us to incorporate prior probability as follows

\[P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}\]

In which, $P(D|\theta)$ is the likelihood, $P(\theta)$ is the prior probability, $P(D)$ is the marginal likelihood, and $P(\theta|D)$ is the posteriori likelihood.

The principle of MAP is to estimate $\theta$ that maximise the posteriori likelihood $P(\theta)$. Thus,

\[\hat{\theta}_{MAP} = argmax\ \frac{P(D|\theta)P(\theta)}{P(D)}\]

Notice that the denominator $P(D)$ is independent of $\theta$. Therefore

\[P(\theta|D) \propto {P(D|\theta)P(\theta)}\] \[\Rightarrow \hat{\theta}_{MAP} = argmax\ {P(D|\theta)P(\theta)}\]

By assuming attributes $X_{1}, X_{2}, …, X_{n}$ are independent of each other, we have

\[\hat{\theta}_{MAP} = argmax\ P(\theta) \prod_{i=1}^n P(X_i|\theta)\]