0%

Directed Graphical Model

1. Directed Graph

Definition 1.1    A directed graph \(\mathcal{G}\) is a pair \((V, D)\) where \(V\) is a finite set of vertices, and \(D \subseteq V \times V\) is a set of ordered pairs \((i, j)\) with \(i, j \in V\) and \(i \neq j\). If \((i, j) \in D\), we write \(i \to j\). If \(i \to j\) or \(i \leftarrow j\), we say \(i\) and \(j\) are adjacent and write \(i \sim j\).

Definition 1.2    A path is a sequence of adjacent vertices without repetition. The path is directed if all the arrows point away from the start.

Example 1.1    Consider \(\mathcal{G}\) with \(V=\{1, 2, 3, 4, 5\}\) and \(D=\{(1, 3), (2, 3), (2, 4), (3, 5) ,(4, 5)\}\).

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

Note that \(1 \to 3 \leftarrow 2 \to 4 \to 5\) and \(1 \to 3 \to 5\) are two paths from \(1\) to \(5\), and the second path is directed.

Definition 1.3    A directed cycle is a directed path from \(i\) to \(j \neq i\), together with \(j \to i\).

Definition 1.4    Graph that contains no directed cycles is called acyclic, or more specifically, directed acyclic graph (DAG).

Definition 1.5    If \(i \to j\), then \(i\) is a parent of \(j\), and \(j\) is a child of \(i\), denoted \(i \in \text{pa}_\mathcal{G}(j)\) and \(j \in \text{ch}_\mathcal{G}(i)\). If \(a \to \cdots \to b\) or \(a=b\), then \(a\) is an ancestor of \(b\), and \(b\) is a descendant of \(a\), denoted \(a \in \text{an}_\mathcal{G}(b)\) and \(b \in \text{de}_\mathcal{G}(a)\). If \(w \notin \text{de}_\mathcal{G}(v)\), then \(w\) is a non-descendant of \(v\), and we denote \(\text{nd}_\mathcal{G}(v)=V-\text{de}_\mathcal{G}(v)\).

A vertex is an ancestor and descendant of itself, and thus no vertex is a non-descendant of itself.

Example 1.2    Consider the graph \(\mathcal{G}\):

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

We have \(\text{pa}_\mathbf{G}(3)=\{1, 2\}\), \(\text{ch}_\mathcal{G}(5)=\varnothing\), \(\text{an}_\mathcal{G}(4)=\{2, 4\}\), \(\text{de}_\mathcal{G}(1)=\{1, 3 ,5\}\), and \(\text{nd}_\mathcal{G}(1)=\{2, 4\}\).

Definition 1.6    A topological ordering of the vertices of the graph is an ordering \(1, \ldots, k\) s.t. \(i \in \text{pa}_\mathcal{G}(j)\) implies that \(i<j\). Acyclicity ensures that a topological ordering always exists.

Example 1.3    Consider the graph \(\mathcal{G}\):

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

The topological orderings are \[\begin{aligned} &1, 2, 3, 4, 5 \\&1, 2, 4, 3, 5 \\&2, 1, 3, 4, 5 \\&2, 1, 4, 3, 5 \\&2, 4, 1, 3, 5 \end{aligned}\]

2. Markov Property

We associate a model with each DAG via various Markov properties.

Consider any multivariate distribution with density \(p(x_V)\), where \(x_V \in \mathcal{X}_V\), and \(x_V=(x_1, \ldots, x_k)^T\). For all \(x_V \in \mathcal{X}_V\), \(p\) factorizes as \[p(x_V)=\prod_{i=1}^k p(x_i \mid x_1, \ldots, x_{i-1})=\prod_{i=1}^k p(x_i \mid x_{\text{pre}<(i)}).\]

A DAG model uses the form with a topological ordering of the graph, and states that the right-hand side of each factor only depends on the parents of \(i\).

Definition 2.1    We say that \(p\) factorizes according to a directed graph \(\mathcal{G}\) if for all \(x_V \in \mathcal{X}_V\), \[p(x_V)=\prod_{i=1}^k p(x_i \mid x_{\text{pa}(i)}),\] which implies that if ordering \(<\) is topological, for \(i \in V\), \[p(x_i \mid x_{\text{pre}<(i)})=p(x_i \mid x_{\text{pa}(i)}),\] i.e., for all \(i \in V\), \[X_i \perp\!\!\!\!\perp X_{\text{pre}<(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p],\] which is called the ordered local Markov property corresponding to the ordering of the variables.

Since the ordering is arbitrary provided that it is topological, we can pick \(<\) so that as many vertices come before \(i\) as possible, then for all \(i \in V\), \[X_i \perp\!\!\!\!\perp X_{\text{nd}(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p],\] which is called the local Markov property for \(\mathcal{G}\).

Example 2.1    Consider the graph \(\mathcal{G}\):

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

If we apply ordered local Markov property, we have \[\begin{aligned} X_1 &\perp\!\!\!\!\perp X_\varnothing \mid X_\varnothing \\ X_2 &\perp\!\!\!\!\perp X_1\mid X_\varnothing \\ X_3 &\perp\!\!\!\!\perp X_\varnothing \mid X_1, X_2 \\ X_4 &\perp\!\!\!\!\perp X_1, X_3 \mid X_2 \\ X_5 &\perp\!\!\!\!\perp X_1, X_2 \mid X_3, X_4 \end{aligned}\] and get three independences \[\begin{aligned} X_2 &\perp\!\!\!\!\perp X_1 \\ X_4 &\perp\!\!\!\!\perp X_1, X_3 \mid X_2 \\ X_5 &\perp\!\!\!\!\perp X_1, X_2 \mid X_3, X_4 \end{aligned}\]

If we apply local Markov property, we have \[\begin{aligned} X_1 &\perp\!\!\!\!\perp X_2, X_4 \\ X_2 &\perp\!\!\!\!\perp X_1 \\ X_3 &\perp\!\!\!\!\perp X_4 \mid X_1, X_2 \\ X_4 &\perp\!\!\!\!\perp X_1, X_3 \mid X_2 \\ X_5 &\perp\!\!\!\!\perp X_1, X_2 \mid X_3, X_4 \end{aligned}\]

Moreover, for instance, since \(X_4 \perp\!\!\!\!\perp X_1, X_3 \mid X_2\) and \(X_5 \perp\!\!\!\!\perp X_1, X_2 \mid X_3, X_4\), we have \(X_4 \perp\!\!\!\!\perp X_1 \mid X_2, X_3\) and \(X_5 \perp\!\!\!\!\perp X_1 \mid X_2, X_3, X_4\), and thus \(X_1 \perp\!\!\!\!\perp X_4, X_5 \mid X_2, X_3\).

We might wonder if there is a way to tell all independences immediately from the graph, and thus we need the concept of global Markov property.

Definition 2.2    An ancestral set is the set that contains all its own ancestors, i.e., \(\text{an}(A)=A\).

Example 2.2    Consider the graph \(\mathcal{G}\):

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

\(\{1, 2, 3\}\) is ancestral, but \(\{4\}\) is not ancestral.

Theorem 2.1    Let \(A\) be an ancestral set in \(\mathcal{G}\). If \(p(x_V)\) factorizes according to \(\mathcal{G}\), then \(p(x_A)\) factorizes according to \(\mathcal{G}_A\).

Proof. Suppose \(p(x_V)\) factorizes according to \(\mathcal{G}\), then \[p(x_V)=\prod_{i \in V} p(x_i \mid x_{\text{pa}(i)})=\prod_{i \in A} p(x_i \mid x_{\text{pa}(i)}) \prod_{i \in V-A} p(x_i \mid x_{\text{pa}(i)}).\] Since \(A\) is ancestral, then the first product does not depend on \(V-A\). Hence, for all \(x_A \in \mathcal{X}_A\), \[p(x_A)=\sum_{x_{V-A}} p(x_V)=\prod_{i \in A} p(x_i \mid x_{\text{pa}(i)})\] and thus \(p(x_A)\) factorizes according to \(\mathcal{G}_A\).

\(\square\)

In particular, if we wish to interrogate whether a conditional independence \(X_A \perp\!\!\!\!\perp X_B \mid X_C\ [p]\) holds under a DAG model, we can restrict to checking if the independence holds in the induced subgraph over the ancestral set \(\text{an}(A \cup B \cup C)\), i.e., \(X_A \perp\!\!\!\!\perp X_B \mid X_c\ [p(x_{\text{an}(A, B, C)})]\).

Definition 2.3    A v-structure is a triple \(i \to k \leftarrow j\) s.t. \(i \not\sim j\). The moral graph \(\mathcal{G}^m\) is formed by joining any parents in a v-structure, and dropping the direction of edges.

Theorem 2.2    If \(p(x_V)\) factorizes according to a DAG \(\mathcal{G}\), then \(p(x_V)\) factorizes according to the undirected graph \(\mathcal{G}^m\).

Proof. If \(p(x_V)\) factorizes according to a DAG \(\mathcal{G}\), then \[p(x_V)=\prod_{i \in V} p(x_i \mid x_{\text{pa}(i)}).\] Note that \(\{i\} \cup \text{pa}(i)\) is complete in \(\mathcal{G}^m\). Let \(C_1\) be the first clique with the smallest \(i\) and let \[\psi_{C_1}(x_{C_1})=\prod_{i \in C_1} p(x_i \mid x_{\text{pa}(i)}).\] Let \(C_k\) be the \(k\)th clique with \(k\)th smallest \(i\) and let \[\psi_{C_k}(x_{C_k})=\prod_{i \in C_k} p(x_i \mid x_{\text{pa}(i)})\] but without the terms included in \(\psi_{C_j}\) where \(j<k\). We can find all cliques and \[p(x_V)=\prod_{C_k} \psi_{C_k}(x_{C_k}),\] i.e., \(p(x_V)\) factorizes according to the undirected graph \(\mathcal{G}^m\).

\(\square\)

Definition 2.4    We say \(p(x_V)\) satisfies the global Markov property w.r.t. a directed graph \(\mathcal{G}\) if whenever \(A\) is separated from \(B\) by \(C\) in \((\mathcal{G}_{\text{an}(A \cup B \cup C)})^m\), we have \(X_A \perp\!\!\!\!\perp X_B \mid X_C\ [p]\).

Example 2.3    Consider the graph \(\mathcal{G}\):

            graph TB
            1((1)) & 2((2)) --> 3((3))
3 & 4((4)) --> 5((5))
2 --> 4
          

Suppose we want to know whether \(X_1 \perp\!\!\!\!\perp X_5 \mid X_3\). The moral graph \((\mathcal{G}_{\text{an}(X_1 \cup X_3 \cup X_5)})^m=\mathcal{G}^m\), i.e.,

            graph LR
            1((1)) & 2((2)) --- 3((3))
3 & 4((4)) --- 5((5))
2 --- 4
1 --- 2
3 --- 4
          

\(\mathcal{G}^m\) suggests that \(X_1\) is not independent of \(X_5\) given \(X_3\).

Suppose we want to know whether \(X_2 \perp\!\!\!\!\perp X_1 \mid X_4\). The moral graph \((\mathcal{G}_{\text{an}(X_1 \cup X_2 \cup X_4)})^m\) is

            graph TB
            1((1))
2((2)) --- 4((4))
          

Hence, \(X_2 \perp\!\!\!\!\perp X_1 \mid X_4\). Furthermore, \(X_1 \perp\!\!\!\!\perp X_2, X_4\).

Theorem 2.3 (Completeness of Global Markov Property)    Let \(\mathcal{G}\) be a DAG, then there exists a probability distribution \(p\) s.t. \(X_A \perp\!\!\!\!\perp X_B \mid X_C\ [p]\) iff \(A \perp_s B \mid C\ [(\mathcal{G}_{\text{an}(A \cup B \cup C)})^m]\).

Any independence not exhibited by a separation will not generally hold in distributions Markov to \(\mathcal{G}\). In other words, the global Markov property gives all conditional independences that are implied by the DAG model.

Theorem 2.4    Let \(\mathcal{G}\) be a DAG and \(p\) be a probability density. The following are equivalent:

    (1) \(p\) factorizes according to \(\mathcal{G}\);

    (2) \(p\) is globally Markov w.r.t. \(\mathcal{G}\);

    (3) \(p\) is locally Markov w.r.t. \(\mathcal{G}\).

Proof. We need to show (1) \(\Rightarrow\) (2) \(\Rightarrow\) (3) \(\Rightarrow\) (1).

(i) Let \(W=\text{an}(A \cup B \cup C)\) and suppose there is a separation \(A \perp_s B \mid C\ [(\mathcal{G}_W)^m]\). By theorem 2.1, we have for all \(x_W\), \[p(x_W)=\prod_{i \in W} p(x_i \mid x_{\text{pa}(i)}).\] Then by theorem 2.2, we know \(p(x_W)\) factorizes according to \((\mathcal{G}_W)^m\). Therefore, \(p(x_W)\) satisfies \(X_A \perp\!\!\!\!\perp X_B \mid X_C\ [p]\). Since \(A\), \(B\) and \(C\) are arbitrary, then \(p\) is globally Markov w.r.t. \(\mathcal{G}\).

(ii) Moralizing \(\mathcal{G}_{\{i\} \cup \text{nd}(i)}\) will not add any edges adjacent to \(i\), and \(\{i\} \cup \text{nd}(i)\) is an ancestral set. Hence, \[i \perp_s \text{nd}(i)-\text{pa}(i) \mid \text{pa}(i)\ [(\mathcal{G}_{\{i\} \cup \text{nd}(i)})^m].\] By the global Markov property, \(X_i \perp\!\!\!\!\perp X_{\text{nd}(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p]\), i.e., \(p\) is locally Markov w.r.t. \(\mathcal{G}\).

(iii) We have \(X_i \perp\!\!\!\!\perp X_{\text{nd}(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p]\). Let \(1, \ldots, k\) be a topological order. Note that for all \(i \in V\), \[X_i \perp\!\!\!\!\perp X_{\text{pre}<(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p].\] Therefore, for all \(i \in V\), \[p(x_i \mid x_{\text{pre}<(i)})=p(x_i \mid x_{\text{pa}(i)})\] and thus for all \(x_V \in \mathcal{X}_V\), \[p(x_V)=\prod_{i=1}^k p(x_i \mid x_{\text{pre}<(i)})=\prod_{i=1}^k p(x_i \mid x_{\text{pa}(i)}).\]

\(\square\)

3. Maximum Likelihood Estimation

Consider a contingency table with counts \(n(x_V)\), the likelihood is \[\begin{aligned} l(p; n)&=\sum_{x_V \in \mathcal{X}_V} n(x_V) \ln p(x_V) \\&=\sum_{x_V \in \mathcal{X}_V} n(x_V) \sum_{i \in V} \ln p(x_i \mid x_{\text{pa}(i)}) \\&=\sum_{i \in V} \sum_{x_V \in \mathcal{X}_V} n(x_V)\ln p(x_i \mid x_{\text{pa}(i)}) \\&=\sum_{i \in V} \sum_{x_i \in \mathcal{X}_i} \sum_{x_{\text{pa}(i)} \in \mathcal{X}_{\text{pa}(i)}} \ln p(x_i \mid x_{\text{pa}(i)}) \sum_{x_O \in \mathcal{X}_V-\mathcal{X}_i-\mathcal{X}_{\text{pa}(i)}} n(x_V) \\&=\sum_{i \in V} \sum_{x_{\text{pa}(i)} \in \mathcal{X}_{\text{pa}(i)}} \sum_{x_i \in \mathcal{X}_i} n(x_i, x_{\text{pa}(i)}) \ln p(x_i \mid x_{\text{pa}(i)}) \end{aligned}\] where each of the conditional distributions \(p(x_i \mid x_{\text{pa}(i)})\) can be dealt with separately, i.e., we can separately maximize each inner sum \(\displaystyle \sum_{x_i \in \mathcal{X}_i} n(x_i, x_{\text{pa}(i)}) \ln p(x_i \mid x_{\text{pa}(i)})\) s.t. \(\displaystyle \sum_{x_i \in \mathcal{X}_i} p(x_i \mid x_{\text{pa}(i)})=1\). Hence, the MLE of \(\mathcal{G}\) is for all \(x_i\) and \(x_\text{pa}(i)\), \[\widehat{p}(x_i \mid x_{\text{pa}(i)})=\frac{n(x_i, x_{\text{pa}(i)})}{n(x_{\text{pa}(i)})}.\]

Suppose each \(v \in V\) has a model for \(p(x_v \mid x_{\text{pa}(v)})\) and we have independent priors for each factor, i.e., \[\pi(\theta)=\prod_{v \in V} \pi(\theta_v).\] Therefore \[p(x_V; \theta)=\prod_{i \in V} p(x_i \mid x_{\text{pa}(i)}; \theta_i)\] and thus \[\pi(\theta)p(x_V; \theta)=\prod_{i \in V} p(x_i \mid x_{\text{pa}(i)}; \theta_i)\pi(\theta_i).\] Hence, \[\theta_i \perp\!\!\!\!\perp X_{V-(\{i\} \cup \text{pa}(i))}, \theta_{-i} \mid X_i, X_{\text{pa}(i)}.\]

4. Markov Equivalence

Definition 4.1    \(\mathcal{G}\) and \(\mathcal{G}'\) are Markov equivalent if any \(p\) which is Markov w.r.t. \(\mathcal{G}\) is also Markov w.r.t. \(\mathcal{G}'\), and vice versa.

Definition 4.2    Let \(\mathcal{G}=(V, D)\) be a DAG, then the skeleton of \(\mathcal{G}\) is the undirected graph \(\text{skel}(\mathcal{G})\) obtained by dropping the orientations.

Lemma 4.1    If \(\text{skel}(\mathcal{G}) \neq \text{skel}(\mathcal{G}')\), then \(\mathcal{G}\) and \(\mathcal{G}'\) are not Markov equivalent.

Proof. Suppose w.l.o.g. that \(i \to j\) in \(\mathcal{G}\) but not in \(\mathcal{G}'\). Let \[p(x_V)=p(x_j \mid x_i) \prod_{k \neq j} p(x_k).\] Since \(i\) and \(j\) are dependent, and there is no way that we can separate them in a moral graph, then \(i\) and \(j\) cannot be conditionally independent given any subset of \(V-\{i, j\}\).

In \(\mathcal{G}'\), \(X_i \perp\!\!\!\!\perp X_{\text{nd}(i)-\text{pa(i)}} \mid X_{\text{pa}(i)}\) and \(X_j \perp\!\!\!\!\perp X_{\text{nd}(j)-\text{pa(j)}} \mid X_{\text{pa}(j)}\). Therefore, \(X_i\) and \(X_j\) are conditionally independent given some subsets of \(V-\{i, j\}\). Hence, \(\mathcal{G}\) and \(\mathcal{G}'\) are not Markov equivalent.

\(\square\)

Theorem 4.2    Let \(\mathcal{G}\) and \(\mathcal{G}'\) be DAGs, then \(\mathcal{G}\) and \(\mathcal{G}'\) are Markov equivalent iff they have the same skeletons and v-structures.

Proof. We only show the only if direction.

If \(\mathcal{G}\) and \(\mathcal{G}'\) have different skeletons, then \(\mathcal{G}\) and \(\mathcal{G}'\) are not Markov equivalent.

Suppose now \(\mathcal{G}\) and \(\mathcal{G}'\) have same skeletons, and \(\mathcal{G}\) has a v-structure \(i \to k \leftarrow j\) but \(\mathcal{G}'\) does not. Let \[p(x_V)=p(x_k \mid x_i, x_j) \prod_{l \neq k} p(x_l)\] for all \(x_V \in \mathcal{X}_V\). Note that \(p\) factorizes w.r.t. \(\mathcal{G}\).

Since \(\mathcal{G}\) and \(\mathcal{G}'\) have same skeletons, then either \(i \to k \to j\) or \(i \leftarrow k \to j\) or \(i \leftarrow k \leftarrow j\). Suppose there exists a \(d \in V\) s.t. \(d \in W=\text{an}_{\mathcal{G}'}(\{i, j, k\})\) and \(i \to d \leftarrow j\), then we can find contradiction immediately (the graph is cyclic). Hence, there is no common child of \(i\) and \(j\) in \(\mathcal{G}'\), and thus \(i \not\sim j\) in \((\mathcal{G}'_W)^m\). Therefore, \(X_i \perp\!\!\!\!\perp X_j \mid X_{W-\{i, j\}}\). Since either \(k \in \text{pa}_{\mathcal{G}'}(i)\) or \(k \in \text{pa}_{\mathcal{G}'}(j)\), then \(k \in W\). However, the independence does not hold in \(p\) since we have a factor \(p(x_k \mid x_i, x_j)\). Hence, \(\mathcal{G}\) and \(\mathcal{G}'\) are not Markov equivalent.

\(\square\)

5. Decomposability

Theorem 5.1    A DAG \(\mathcal{G}\) is Markov equivalent to an undirected graph iff it contains no v-structures. In this case, the equivalent undirected graph is the skeleton of \(\mathcal{G}\).

Proof. (\(\Leftarrow\)) Suppose there is no v-structure, then \(\mathcal{G}^m=\text{skel}(\mathcal{G})\). We know that if \(p\) is Markov w.r.t. \(\mathcal{G}\), then it is also Markov w.r.t. \(\mathcal{G}^m\).

Suppose \(p\) is Markov w.r.t. \(\mathcal{G}^m\). Let \(v\) be a vertex with no child in \(\mathcal{G}\). Since \(v\) does not have child and \(\text{pa}_\mathcal{G}(v)\) must be complete in \(\mathcal{G}^m\), then \(v \perp_s V-(\text{pa}_\mathcal{G}(v) \cup \{v\}) \mid \text{pa}_\mathcal{G}(v)\ [\mathcal{G}^m]\). Hence, by the global Markov property for \(\mathcal{G}^m\), \(X_v \perp\!\!\!\!\perp X_{V-(\text{pa}_\mathcal{G}(v) \cup \{v\})} \mid X_{\text{pa}_\mathcal{G}(v)}\ [p]\), i.e., \(X_v \perp\!\!\!\!\perp X_{\text{nd}_\mathcal{G}(v)-\text{pa}_\mathcal{G}(v)} \mid X_{\text{pa}_\mathcal{G}(v)}\ [p]\), which is the local Markov property applied at \(v\).

\(\mathcal{G}_{-v}\) also has no v-structure, and so by induction, \(p(x_{V-\{v\}})\) is Markov w.r.t. \(\mathcal{G}_{-v}\), and thus for all \(i \in V-\{v\}\), \(X_i \perp\!\!\!\!\perp X_{\text{nd}(i)-\text{pa}(i)} \mid X_{\text{pa}(i)}\ [p]\). Since \(X_v \perp\!\!\!\!\perp X_{\text{nd}_\mathcal{G}(v)-\text{pa}_\mathcal{G}(v)} \mid X_{\text{pa}_\mathcal{G}(v)}\ [p]\), then local Markov property for \(\mathcal{G}\) holds. Hence, \(p\) is Markov w.r.t. \(\mathcal{G}\), and thus \(\mathcal{G}\) and \(\mathcal{G}^m\) are Markov equivalent.

(\(\Rightarrow\)) If \(\mathcal{G}\) has a v-structure \(i \to k \leftarrow j\), then there exists an independence between \(X_i\) and \(X_j\) by the local Markov property. If we add the edge \(i \sim j\), then the undirected graph \(\mathcal{G}^m\) will not be Markov equivalent to \(\mathcal{G}\).

However, since \(i \sim j\) in \(\mathcal{G}^m\), by theorem 2.3, we know there exists a distribution, Markov w.r.t. \(\mathcal{G}\), where \(X_i\) is not conditionally independent of \(X_j\) given \(X_{V-\{i, j\}}\). Therefore, if we do not add the edge \(i \sim j\), the undirected graph \(\text{skel}(\mathcal{G})\) will not be Markov equivalent to \(\mathcal{G}\). Hence, no undirected graph can be Markov equivalent to \(\mathcal{G}\).

\(\square\)

Corollary 5.2    A undirected graph is Markov equivalent to a directed graph iff the undirected graph is decomposable.

Decomposable models represent the intersection of undirected and directed graphical models.