0%

Junction Tree and Message Passing

1. Junction Tree

Definition 1.1    A junction tree is a connected undirected graph without cycles (a tree), which has vertex Ci that consists of subset of V, and satisfies the property that if CiCj=S, then every vertex on the unique path from Ci to Cj contains S.

Example 1.1    Given sets {1,2}, {2,3,4}, {2,4,5}, {4,6}, {6,7,8}, we can build a junction tree:

1, 2
2, 3, 4
4, 6
2, 4, 5
6, 7, 8

Equally, we could use a different ordering: {6,7,8}, {4,6}, {2,4,5}, {1,2}, {2,3,4}.

1, 2
2, 3, 4
4, 6
2, 4, 5
6, 7, 8

Theorem 1.1    If T is a junction tree, then its vertices satisfy the running intersection property. Conversely, if C1,,Ck satisfy the running intersection property, they can be arranged into a junction tree.

Proof. Suppose T is a junction tree. Pick a leaf Ck and remove it to obtain Tk. By the induction hypothesis, the remaining sets, C1,,Ck1 satisfy the running intersection property. Pick Cσ(k) to be the unique neighbor of Ck in T. By the junction tree property, CkCjCσ(k) and CkCjCk for any j<k, then Ck(i<kCi)CkCσ(k). Since Cσ(k)i<kCi, it is obvious that Ck(i<kCi)=CkCσ(k).

Suppose C1,,Ck satisfy the running intersection property, and assume {C1,,Ck1} is a junction tree. Let CkCσ(k). We know CkCiCσ(k) for all i. If CkCi=, then every vertex from Ci to Ck contains . If CkCi=S, we know CkCiCkCσ(k)Cσ(k). Assume SCj for some j, then CiCσ(k)Cσ(k)Cj, which is a contradction. Hence SCj for all j, and thus {C1,,Ck} is a junction tree.

Corollary 1.2    If C1,,Ck satisfy the running intersection property, then they satisfy the running intersection property starting with any node Cj.

Definition 1.2    We associate each node C in junction tree with a potential ψC(xC)0, which is a function over the variables in the corresponding set. We say that two potentials are consistent if they have the same margin over their intersection, i.e., for ψC(xC) and ψD(xD), xCDψC(xC)=f(xCD)=xDCψD(xD), where S=CD.

Theorem 1.3    Suppose C1,,Ck satisfy the running intersection property with separator sets S2,,Sk, and let p(xV)=i=1kψCi(xCi)ψSi(xSi) for all xVXV (S1= and ψS1=1, by convention). Then the potentials are all consistent iff ψCi(xCi)=p(xCi) and ψSi(xSi)=p(xSi) for all i=1,,k.

Proof. () If ψCi(xCi)=p(xCi), then xCiCjψCi(xCi)=xCiCjp(xCi)=p(xSij)=xCjCip(xCj)=xCjCiψCj(xCj) for any Ci and Cj, i.e., the potentials are all consistent.

() If k=1, there is nothing to prove. Otherwise, let Rk=CkSk, then p(xVRk)=xRkp(xV)=i=1k1ψCi(xCi)ψSi(xSi)1ψSk(xSk)xRkψCk(xCk). Since all potentials are consistent and Sk=Cki<kCi, we have xRkψCk(xCk)=xCkSkψCk(xCk)=xSkCkψSk(xSk)=xψSk(xSk)=ψSk(xSk) and thus p(xVRk)=i=1k1ψCi(xCi)ψSi(xSi), i.e., p(xVRk) only has k1 cliques, then ψCi(xCi)=p(xCi) and ψSi(xSi)=p(xSi) for all i<k by the induction hypothesis.

By the running intersection property, Sk=CkCσ(k), and hence by consistency ψSk(xSk)=xCσ(k)SkψCσ(k)(xCσ(k))=xCσ(k)Skp(xCσ(k))=p(xSk).

Then p(xV)=p(xVRk)ψCk(xCk)ψSk(xSk)=p(xVRk)ψCk(xCk)p(xSk), and thus p(xRkxVRk)=ψCk(xCk)p(xSk), which only depends on xCk, then p(xRkxVRk)=p(xRkxSk)=ψCk(xCk)p(xSk), i.e., ψCk(xCk)=p(xCk).

The steps to forming a junction tree:

  1. Moralize.
  2. Drop directions.
  3. Triangulate (add edges to get a decomposable graph).

2. Message Passing

Definition 2.1    Suppose we have potentials ψC, ψD and ψS, where S=CD. Then passing a message from ψC to ψD involves:

  • ψS(xS)=xCSψC(xC) (margin of ψC over the variables in S);

  • ψD(xD)=ψD(xD)ψS(xS)ψS(xS).

We have three important properties:

  • i=1kψCi(xCi)ψSi(xSi) is unchanged because ψD(xD)ψS(xS)=ψD(xD)ψS(xS);

  • ψS and ψC are consistent since xSCψS(xS)=xψS(xS)=ψS(xS)=xCSψC(xC);

  • If ψD and ψS are consistent, so are ψD and ψS, since xDSψD(xD)=ψS(xS)ψS(xS)xDSψD(xD)=ψS(xS).

Hence, updating preserves the joint distribution and does not upset margins that are already consistent.

3. Junction Tree Algorithm

Given a tree, we can pick any vertex as a root, and direct all edges away from it.

Algorithm 1 Collect and Distribute Steps of the Junction Tree Algorithm.
    function Collect(rooted tree T, potentials ψt)
        let 1<<k be a topological ordering of T;
        for t in k,,2 do
            send message from ψt to ψσ(t);
        end for
        return updated potentials ψt
    end function
    function Distribute(rooted tree T, potentials ψt)
        let 1<<k be a topological ordering of T;
        for t in k,,2 do
            send message from ψσ(t) to ψt;
        end for
        return updated potentials ψt
    end function

Theorem 3.1    Let T be a junction tree with potentials ψCi(xCi). Then after running the junction tree algorithm, all potentials will be consistent.

Proof. Since we pass a message from ψCi to ψCσ(i) and by the property of message passing, ψCi and ψSi are consistent. By property, the consistency between ψCi and ψSi is preserved after future updates from ψCσ(i). Similarly, distribution ensures ψSi and ψCσ(i) are consistent. Hence, all pairs of potentials are consistent.

In practice, message passing is often done in parallel and we have the convergence after at most d iterations, where d is the length of the longest path of the tree.

Junction graphs in general (loopy belief propagation) also seem to lead to convergence, but it is still an open area of study.

4. Evidence

Denote evidence set E, then p(xVExE=xE)=p(xVE,xE)p(xE). In junction tree situation, p(xV)=iψCi(xCi)ψSi(xSi). If there is a clique that contains the node E (ECi), then we replace ψCi(xCi) with ψCi(xCi)p(xE). After running the junction tree algorithm, we have ψCi(xCi)=p(xCixE).