0%

Introduction to Network

1. Definition

Definition 1.1    We abbreviate a graph \(\mathcal{G}\) as \(\mathcal{G}=(\mathcal{V}, \mathcal{E})\), where \(\mathcal{V}\) is the set of vertices (nodes) and \(\mathcal{E}\) is the set of edges (links). Let \(|\mathcal{V}|\) be the number of vertices, and \(|\mathcal{E}|\) be the number of edges in the graph \(\mathcal{G}\). If \(u\) and \(b\) are two vertices and there is an edge from \(u\) to \(v\), then we say \(v\) is a neighbor of \(u\), denoted \((u, v) \in \mathcal{E}\).

Definition 1.2    A directed graph or digraph is a graph where all edges are directed. The underlying graph of a digraph is the graph that results from turning all directed edges into undirected edges.

Definition 1.3    If both endpoints of an edge are the same, then the edge is a loop.

Example 1.1    The graph below is a loop.

            graph LR
            1((1)) --- 1
          

Definition 1.4    Two vertices are called adjacent if they are joined by an edge. A graph can be described by its \(|\mathcal{V}| \times |\mathcal{V}|\) adjacency matrix \(A=(a_{u, v})\), where \(a_{u, v}=1\) iff \((u, v) \in \mathcal{E}\).

If there are no self-loops, all elements on the diagonal of the adjacency matrix are \(0\). If the edges of the graph are undirected, then the adjacency matrix will be symmetric.

The adjacency matrix entries tell us for every vertex \(v\) which vertices are within (graph) distance \(1\) of \(v\).

If we take the matrix product \(A^2=AA\), the entry for \((u, v)\) with \(u \neq v\) would be \[a^{(2)}(u, v)=\sum_{w \in \mathcal{V}} a_{u, w}a_{w, v}.\] If \(a^{(2)}(u, v) \neq 0\), then \(u\) can be reached from \(v\) within two steps, i.e., \(u\) is within distance \(2\) of \(v\). Higher powers can be interpreted similarly.

Definition 1.5    A complete graph is a graph, without self-loops, s.t. every pair of vertices is joined by an edge. The adjacency matrix has entry \(0\) on the diagonal, and \(1\) everywhere else.

Example 1.2    The graph below is complete.

            graph LR
            1((1)) --- 2((2)) --- 3((3))
1 --- 3
          

Definition 1.6    A bipartite graph is a graph where the vertex set \(\mathcal{V}\) is decomposed into two disjoint subsets, \(\mathcal{U}\) and \(\mathcal{W}\), s.t. there are no edges between any two vertices in \(\mathcal{U}\), and there are no edges between any two vertices in \(\mathcal{W}\); all edges have one endpoint in \(\mathcal{U}\) and the other endpoint in \(\mathcal{W}\). The adjacency matrix \(A\) can then be arranged s.t. \[A=\begin{bmatrix} 0 & A_1 \\ A_2 & 0 \end{bmatrix}.\]

Example 1.3    The graph below is bipartite.

            graph LR
            1((1)) --- 3((3)) & 4((4))
2((2)) --- 3((3)) & 4((4))
subgraph U
1((1))
2((2))
end
subgraph W
3((3))
4((4))
end
          

2. Network Summary

2.1. Density, Degree and Degree Distribution

Definition 2.1    The density of a network with \(e\)edges and \(n\) vertices is \[\frac{e}{\displaystyle \binom{n}{2}}=\frac{2e}{n(n-1)}\] where \(\displaystyle \binom{n}{2}\) is the potential number of edges.

Definition 2.2    The degree \(d(v)\) of a vertex \(v\) is the number of edges which involve \(v\) as an endpoint. The degree can be calculated from the adjacency matrix \(A\) \[d(v)=\sum_{u \in \mathcal{V}} a_{u, v}.\]

Definition 2.3    The average degree of a graph is the average of its vertex degrees \[\overline{d}=\frac{1}{n}\sum_{v \in \mathcal{V}} d(v).\]

Definition 2.4    The degree sequence of a given network on a vertex set \(\mathcal{V}\) with \(n\) elements is the unordered \(n\)-element set of degrees \(\{d(v), v \in \mathcal{V}\}\).

Definition 2.5    The degree distribution \((d_0, d_1, \ldots)\) of a graph on \(n\) vertices is the vector of fraction of vertices with given degree \[d_k=\frac{1}{n} \times \text{Number of vertices of degree}\ k.\]

For directed graphs, we define the in-degree as the number of edges directed at the vertex, and the out-degree as the number of edges taht go out from that vertex.

2.2. Local Clustering Coefficient

Definition 2.6    The local clustering coefficient of a vertex \(v\) is the proportion of neighbors of \(v\) which are neighbors themselves. In adjacency matrix notation \[C(v)=\frac{\displaystyle \sum_{u, w \in \mathcal{V}} a_{u, v}a_{w, v}a_{u,w}}{\displaystyle \sum_{u \neq w \in \mathcal{V}} a_{u, v}a_{w, v}},\] where \(\displaystyle \frac{0}{0}:=0\).

To compute \(C(v)\) more efficiently, we should note that \(\displaystyle \frac{1}{2}\sum_{\substack{u \neq v \in \mathcal{V} \\ w \neq u, v \in \mathcal{V}}} a_{u, v}a_{w, v}a_{u, w}\) is the number of triangles involving \(v\) in the graph, and \(\displaystyle \frac{1}{2}\sum_{u \neq w \in \mathcal{V}} a_{u, v}a_{w, v}\) is the number of \(2\)-stars centered around \(v\) in the graph.

Definition 2.7    The average clustering coefficient is defined as \[\overline{C}=\frac{1}{|\mathcal{V}|}\sum_{v \in \mathcal{V}} C(v).\]

The local clustering coefficient describes how locally dense a graph is.

2.3. Global Clustering Coefficient

Definition 2.8    The global clustering coefficient or transitivity is defined as \[C=\frac{3 \times \text{Number of triangles}}{\text{Number of connected triples}}=\frac{6 \times \text{Number of triangles}}{\text{Number of paths of length two}}\]

\(\overline{C} \neq C\) in general. Indeed \(\overline{C}\) tends to be dominated by vertices with low degree, since they tend to have small denominators in the local clustering coefficient.

2.4. Expected Clustering Coefficient

Definition 2.9    For models of random networks, we consider expected clustering coefficient \[E(C)=\frac{3\mathbb{E}[\text{Number of triangles}]}{\mathbb{E}[\text{Number of connected triples}]}\]

2.5. Average Shortest Path

Definition 2.10    In a graph, a path from vertex \(v_0\) to vertex \(v_n\) is an alternating sequence of vertices and edges \((v_0, e_1, v_1, e_2, \ldots, v_{n-1}, e_n, v_n)\) s.t. the endpoints of \(e_i\) are \(v_{i-1}\) and \(v_i\) for \(i=1, \ldots, n\). The distance \(\mathscr{l}(u, v)\) between two vertices \(u\) and \(v\) is the length of a shortest path (not necessarily unique) joining them.

We can calculate \(\mathscr{l}(u, v)\) from the adjacency matrix \(A\) as the smallest power \(p\) of \(A\) s.t. \(A^{(p)}(u, v) \neq 0\).

Definition 2.11    A graph is called connected if there is a path between any pair of vertices in the graph, otherwise it is disconnected. In a connected graph, the average shortest path length is defined as \[\mathscr{l}=\frac{1}{|\mathcal{V}|(|\mathcal{V}|-1)}\sum_{u \neq v \in \mathcal{V}} \mathscr{l}(u, v),\] where \(|\mathcal{V}|(|\mathcal{V}|-1)\) means the possible number of shortest path length.

The average shortest path length describes how globally connected a graph is.

Example 2.1    Consider a graph.

            graph LR
            1((1)) --- 5((5)) --- 3((3)) --- 2((2))
5((5)) --- 2((2))
5((5)) --- 4((4)) --- 2((2))
          

We know \(\mathcal{V}=\{1, 2, 3, 4, 5\}\), \(\mathcal{E}=\{(1, 5), (2, 3), (2, 4), (2, 5), (3, 5), (4, 5)\}\), \(|\mathcal{V}|=5\), and \(|\mathcal{E}|=6\).

The adjacency matrix is \[\begin{bmatrix} 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 \\ 1 & 1 & 1 & 1 & 0 \end{bmatrix}.\]

The density is \(\displaystyle \frac{2e}{n(n-1)}=0.6\).

The degrees of vertices are \(d(1)=1\), \(d(2)=3\), \(d(3)=2\), \(d(4)=2\) and \(d(5)=4\). The average degree is \(\overline{d}=2.4\). The degree sequence is \(\{1, 3, 2, 2, 4\}\). The degree distribution is \(d_0=0\), \(d_1=0.2\), \(d_2=0.4\), \(d_3=0.2\) and \(d_4=0.2\).

The local clustering coefficients of vertices are \(C(1)=0\), \(\displaystyle C(2)=\frac{2}{3}\), \(C(3)=1\), \(C(4)=1\) and \(\displaystyle C(5)=\frac{1}{3}\). The average clustering coefficient is \(\overline{C}=0.6\).

The global clustering coefficient is \(\displaystyle C=\frac{6}{11}\).

The distance matrix is \[\begin{bmatrix} 0 & 2 & 2 & 2 & 1 \\ 2 & 0 & 1 & 1 & 1 \\ 2 & 1 & 0 & 2 & 1 \\ 2 & 1 & 2 & 0 & 1 \\ 1 & 1 & 1 & 1 & 0 \end{bmatrix}.\] The graph is connected, and the average shortest path length is \(\mathscr{l}=1.4\).

2.6. Small Graph and Motif

Definition 2.12    Small subgraph can be viewed as building-block pattern of networks. By small we mean graph on a small number (three to five) of vertices. Often a small graph is called a motif when it is over-represented in the network, where over-representation is judged using a probabilistic model for the network.

We think of a motif as a small graph with a fixed number of vertices and with a given topology, and we use the term interchangeably with small graph.

2.7. Spectral Summary

Definition 2.13    We call a graph simple if it has no self-loops or multiple edges.

Definition 2.14     The eigenvector centrality (Not done. To be continued...)

Definition 2.15     Let \(D\) be the diagonal matrix with diagonal entries \(D_{ii}\), the degree of vertex \(i\). The graph Laplacian is \(L=D-A\), where \[L(i, j)=\begin{cases} d(i), &i=j \\ -1, &i \neq j\ \text{and}\ (i, j) \in \mathcal{E} \\ 0, &\text{otherwise} \end{cases}.\] The second smallest (smallest non-zero) eigenvalue of \(L\) is called algebraic connectivity or Fiedler value or spectral gap of \(\mathcal{G}\).

\(L\) is symmetric, with real eigenvalues \(\lambda_0 \leq \cdots \leq \lambda_{n-1}\) and eigenvectors \(\phi_1, \ldots, \phi_{n-1}\).

Since every row sum and column sum of \(L\) is zero, then \(L\mathbf{1}=0\mathbf{1}\), and thus \(\lambda_0=0\) and \(\phi_0=\mathbf{1}\).

2.8. Other Summary

Specific network may require specific summary. For example, when there is a spatial structure in the network, then geometrical consideration may enter.