0%

Importance Sampling

1. Importance Sampling

Suppose we want to compute I=Eπ[φ(X)]=Xφ(x)π(x)dx where we do not know how to sample from the target π but have access to a proposal or importance distribution of density q, satisfying π(x)>0q(x)>0, i.e. the support of q includes the support of π. Then we have I=Eπ[φ(X)]=Eq[φ(X)w(X)] where w:XR+ is the importance weight function w(x)=π(x)q(x). Hence for X1,,Xni.i.d.q, I^nIS=1ni=1nφ(Xi)w(Xi).

We have Eπ[φ(X)]=Xφ(x)π(x)dx=Xφ(x)π(x)q(x)q(x)dx=Xφ(x)w(x)q(x)dx=Eq[φ(X)w(X)].

1.1. Property

Proposition 1.1    Importance sampler satisfies:

    (1) (Unbiasedness) Eq[I^nIS]=I;

    (2) (Strong Consistency) if Eq[|φ(X)|w(X)]<, then limnI^nIS=I a.s.;

    (3) (CLT) Vq[I^nIS]=σIS2n where σIS2=Vq[φ(X)w(X)]; if σIS2< then limnn(I^nISI)DN(0,σIS2).

Proposition 1.2    The optimal proposal minimizing Vq[I^nIS] is given by qopt(x)=|φ(x)|π(x)X|φ(x)|π(x)dx.

Proof. We have σIS2=Vq[φ(X)w(X)]=Eq[φ2(X)w2(X)]I2. By Jensen's inequality, we have Eq[φ2(X)w2(X)]=Eq[(|φ(X)|w(X))2](Eq[|φ(X)|w(X)])2=(X|φ(x)|π(x)dx)2. For q=qopt, we have Eqopt[φ2(X)w2(X)]=Xφ2(x)w2(x)qopt(x)dx=Xφ2(x)π2(x)qopt(x)dx=Xφ2(x)π2(x)|φ(x)|π(x)dxX|φ(x)|π(x)dx=(X|φ(x)|π(x)dx)2. Therefore, qopt(x) minimizes Vq[I^nIS].

qopt(x) can NEVER be used in practice.

For φ(x)>0, we have qopt(x)=φ(x)π(x)I and Vqopt[I^nIS]=0 but φ(x)w(x)=φ(x)π(x)qopt(x)=I i.e. using qopt(x) requires knowing I.

2. Normalized Importance Sampling

Standard IS has limited applications in statistics as it requires knowing π(x) and q(x) exactly. Assume π(x)=π~(x)Zπandq(x)=q~(x)Zq where π(x)>0q(x)>0. Define w~(x)=π~(x)q~(x), then we have I=Eπ[φ(X)]=Xφ(x)w~(x)q(x)dxXw~(x)q(x)dx.

We have Eπ[φ(X)]=Xφ(x)π(x)dx=Xφ(x)π(x)q(x)q(x)dxXπ(x)q(x)q(x)dx=Xφ(x)π~(x)/Zπq~(x)/Zqq(x)dxXπ~(x)/Zπq~(x)/Zqq(x)dx=Xφ(x)w~(x)q(x)dxXw~(x)q(x)dx=Eq[φ(X)w~(X)]Eq[w~(X)].

2.1. Property

Proposition 2.1 (Strong Consistency)    Let X1,,Xni.i.d.q and assume that Eq[|φ(X)|w(X)]<. Then I^nNIS=i=1nφ(Xi)w~(Xi)i=1nw~(Xi) is strongly consistent.

Proof. We have I^nNIS=i=1nφ(Xi)w~(Xi)i=1nw~(Xi)=n1i=1nφ(Xi)w~(Xi)n1i=1nw~(Xi)a.s.Eq[φ(X)w~(X)]Eq[w~(X)]=I.

Proposition 2.2 (CLT)    If Vq[φ(X)w(X)]< and Vq[w(X)]<, then n(I^nNISI)DN(0,σNIS2) where σNIS2=Vq[w(X)(φ(X)I)]=Xπ2(x)(φ(x)I)2q(x)dx.

Proof. With X1,,Xni.i.d.q, we have n(I^nNISI)=n1/2i=1nw~(Xi)(φ(Xi)I)n1i=1nw~(Xi).

Since Eq[w~(X)(φ(X)I)]=Eq[w~(X)φ(X)]Eq[φ(X)w~(X)]Eq[w~(X)]Eq[w~(X)]=0 and Vq[φ(X)w(X)]<, then by CLT, i=1nw~(Xi)(φ(Xi)I)DN(0,nVq[w~(X)(φ(X)I)], and thus 1ni=1nw~(Xi)(φ(Xi)I)DN(0,Vq[w~(X)(φ(X)I)].

By SLLN, we have 1ni=1nw~(Xi)a.s.Eq[w~(X)]=ZπZq.

Therefore, by Slutsky's theorem, we have n(I^nNISI)=n1/2i=1nw~(Xi)(φ(Xi)I)n1i=1nw~(Xi)DN(0,Zq2Zπ2Vq[w~(X)(φ(X)I)]) where Zq2Zπ2Vq[w~(X)(φ(X)I)]=XZq2Zπ2w~2(x)(φ(x)I)2q(x)dx=Xπ2(x)(φ(x)I)2q(x)dx=σNIS2. Thus n(I^nNISI)DN(0,σNIS2).

3. Variance of Importance Sampling Estimator

Suppose X1,,Xni.i.d.q, then I^nIS=1ni=1nφ(Xi)w(Xi)andI^nNIS=i=1nφ(Xi)w~(Xi)i=1nw~(Xi).

  • The asymptotic variance for I^nIS is Vas[I^nIS]=Eq[(φ(X)w(X)Eq[φ(X)w(X)])2]1ni=1n(φ(Xi)w(Xi)I^nIS)2.

  • The asymptotic variance for I^nNIS is Vas[I^nNIS]=Eq[(w(X)(φ(X)I))2](Eq[w(X)])2n1i=1nw~2(Xi)(φ(Xi)I^nNIS)2(n1i=1nw~(Xi))2.

4. Diagnostic

We solve for ne in 1nVas[I^nNIS]=σ2ne where σ2ne corresponds to the variance of an unweighted sample of size ne. The solution is ne=(i=1nw~(Xi))2i=1nw~2(Xi) which is called the effective sample size.