1. Importance Sampling
Suppose we want to compute where we do not know how to sample from the target but have access to a proposal or importance distribution of density , satisfying , i.e. the support of includes the support of . Then we have where is the importance weight function Hence for ,
1.1. Property
Proposition 1.1 Importance sampler satisfies:
(1) (Unbiasedness) ;
(2) (Strong Consistency) if , then ;
(3) (CLT) where ; if then .
Proposition 1.2 The optimal proposal minimizing is given by
Proof. We have
By Jensen's inequality, we have
For
, we have
Therefore,
minimizes
.
can NEVER be used in practice.
For , we have and but i.e. using requires knowing .
2. Normalized Importance Sampling
Standard IS has limited applications in statistics as it requires knowing and exactly. Assume where . Define then we have
2.1. Property
Proposition 2.1 (Strong Consistency) Let and assume that . Then is strongly consistent.
Proof. We have
Proposition 2.2 (CLT) If and , then where
Proof. With , we have
Since and , then by CLT, and thus
By SLLN, we have
Therefore, by Slutsky's theorem, we have
where
Thus
3. Variance of Importance Sampling Estimator
Suppose , then
4. Diagnostic
We solve for in where corresponds to the variance of an unweighted sample of size . The solution is which is called the effective sample size.