Introduction

Markov chains (Durrett, 2012) are one of the simplest and most powerful mathematical models for stochastic systems.

They appear everywhere:

web-page ranking
reinforcement learning
queueing systems
finance
genetics
statistical physics
language models

The central idea is simple:

The future depends only on the present state, not on the full history.

Formally, a Markov chain satisfies

\[P(X_{t+1}=j \mid X_t=i, X_{t-1}, \dots, X_0) P(X_{t+1}=j \mid X_t=i)\]

where:

\(X_t\) is the state at time \(t\)
\(P(X_{t+1}=j \mid X_t=i)\) is the transition probability from state \(i\) to state \(j\)

A finite Markov chain is completely described by its transition matrix:

\[P = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ p_{21} & p_{22} & \cdots & p_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n1} & p_{n2} & \cdots & p_{nn} \\ \end{bmatrix}\]

Each row sums to \(1\):

\[\sum_{j=1}^n p_{ij} = 1\]

In this post, we’ll walk through several representative examples and explain the most important structural properties of Markov chains.

1. Ergodic Chain

Consider:

\[P = \begin{bmatrix} 0.5 & 0.5 \\ 0.2 & 0.8 \\ \end{bmatrix}\]

This chain is:

irreducible
aperiodic
ergodic

Irreducibility

A chain is irreducible if every state can eventually reach every other state.

Here:

state \(0\) can reach state \(1\)
state \(1\) can reach state \(0\)

So the state space forms a single communicating class.

Aperiodicity

A state has period

\[d(i) \gcd { n \ge 1 : P^n(i,i) > 0 }\]

Because both states have self-loops:

\[P(0 \to 0) = 0.5\]

and

\[P(1 \to 1) = 0.8\]

the chain can return in one step, giving period \(1\).

Ergodicity

An irreducible + aperiodic finite chain is ergodic.

That means:

\[P^n \to \begin{bmatrix} \pi \ \pi \ \vdots \end{bmatrix} \quad \text{as } n \to \infty\]

where \(\pi\) is the stationary distribution.

The stationary distribution solves:

\[\pi P = \pi\]

with

\[\sum_i \pi_i = 1\]

For this chain:

\[\pi = \begin{bmatrix} 0.2857 & 0.7143 \end{bmatrix}\]

meaning the chain spends about \(71%\) of its time in state \(1\) long-run.

2. Periodic Chain

Now consider:

\[P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}\]

This chain alternates forever:

\[0 \to 1 \to 0 \to 1 \to \cdots\]

The return times to state \(0\) are:

\[2,4,6,8,\dots\]

Thus:

\[d(0)=2\]

and similarly:

\[d(1)=2\]

The chain is irreducible but periodic.

Why Periodicity Matters

Even though a stationary distribution exists:

\[\pi = \begin{bmatrix} 0.5 & 0.5 \end{bmatrix}\]

the powers of the transition matrix do not converge:

\[P^{2n} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}\]

while

\[P^{2n+1} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}\]

The system oscillates forever.

3. Absorbing Chain

Consider:

\[P = \begin{bmatrix} 1.0 & 0.0 \\ 0.4 & 0.6 \\ \end{bmatrix}\]

State \(0\) is absorbing because:

\[P(0 \to 0)=1\]

Once entered, it can never be left.

Long-Term Behavior

State \(1\) eventually flows into state \(0\).

Repeated multiplication shows:

\[P^n \to \begin{bmatrix} 1 & 0 \\ 1 & 0 \\ \end{bmatrix}\]

Eventually all probability mass accumulates in the absorbing state.

4. Multiple Closed Classes

Now consider:

\[P = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ \end{bmatrix}\]

States \(0\) and \(1\) are absorbing closed classes.

States \(2\) and \(3\) are transient.

Eventually:

state \(2\) gets trapped in class \({0}\)
state \(3\) gets trapped in class \({1}\)

This chain is reducible because not all states communicate.

5. Fully Connected Chain

A more “generic” stochastic system:

\[P = \begin{bmatrix} 0.2 & 0.3 & 0.5 \\ 0.4 & 0.4 & 0.2 \\ 0.3 & 0.3 & 0.4 \\ \end{bmatrix}\]

Every state communicates with every other state in one step.

The chain is:

irreducible
aperiodic
ergodic

Because all entries are positive:

\[p_{ij} > 0\]

the chain mixes rapidly toward equilibrium.

6. Transient to Closed Flow

Consider:

\[P = \begin{bmatrix} 0.2 & 0.8 & 0.0 \\ 0.0 & 0.5 & 0.5 \\ 0.0 & 0.4 & 0.6 \\ \end{bmatrix}\]

State \(0\) is transient.

Once the chain enters \({1,2}\), it never returns to \(0\).

The subset:

\[{1,2}\]

forms a closed irreducible class.

This structure appears frequently in:

queueing systems
epidemic models
reinforcement learning environments

where some states act as temporary entry points before long-run equilibrium behavior emerges.

7. Self-Loops Create Aperiodicity

Consider:

\[P = \begin{bmatrix} 0.1 & 0.9 \\ 0.7 & 0.3 \\ \end{bmatrix}\]

Even a tiny self-loop can destroy periodicity.

Because:

\[P(0 \to 0)=0.1\]

the chain can return in one step.

Thus:

\[d(0)=1\]

and the entire irreducible chain becomes aperiodic.

This is why adding “noise” or “lazy steps” is a common trick in Markov Chain Monte Carlo (MCMC).

8. Three-Cycle Periodic Chain

Consider:

\[P = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ \end{bmatrix}\]

The chain cycles deterministically:

\[0 \to 1 \to 2 \to 0\]

Returns occur only at times:

\[3,6,9,\dots\]

So:

\[d(i)=3\]

for every state.

The chain is irreducible but not ergodic.

9. Identity Chain

The identity matrix:

\[P = I\]

gives:

\[P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}\]

Every state is absorbing.

Nothing ever changes.

This is the most degenerate possible Markov chain.

10. Random Stochastic Matrix

Your helper function generates matrices by:

sampling random positive entries
normalizing rows
rounding entries
fixing row sums

Mathematically:

\[P_{ij} \frac{A_{ij}} {\sum_k A_{ik}}\]

where \(A_{ij}>0\) are random draws.

Most randomly generated positive matrices are:

irreducible
aperiodic
ergodic

because positive entries imply strong connectivity.

Communicating Classes

One of the most important structural ideas in Markov chains is communication.

State \(i\) communicates with state \(j\) if:

\[i \leftrightarrow j\]

meaning:

\[i \to j \quad \text{and} \quad j \to i\]

A communicating class is a maximal set of mutually reachable states.

These classes determine the chain’s long-run structure.

Transient vs Recurrent States

A state is:

recurrent if it is revisited infinitely often
transient if it can eventually be abandoned forever

For finite chains:

closed irreducible classes are recurrent
states outside closed classes are transient

This decomposition is fundamental to Markov chain theory.

Stationary Distributions

A stationary distribution satisfies:

\[\pi P = \pi\]

Interpretation:

If the chain starts distributed as \(\pi\), it remains distributed as \(\pi\) forever.

For ergodic chains:

\[\lim_{n\to\infty} P^n(i,j)=\pi_j\]

independent of the starting state.

This is the mathematical basis of:

PageRank
MCMC
Gibbs sampling
Hidden Markov Models
stochastic optimization

Final Thoughts

These examples capture the major behaviors finite Markov chains can exhibit:

Type	Key Feature
Ergodic	Converges to equilibrium
Periodic	Oscillates forever
Absorbing	Gets trapped
Reducible	Splits into disconnected regions
Fully connected	Rapid mixing
Transient flow	Temporary states vanish

Markov chains are deceptively simple objects.

Yet from the single equation

\[P(X_{t+1}=j \mid X_t=i)\]

emerges an enormous theory connecting:

probability
linear algebra
dynamical systems
statistical mechanics
machine learning

and modern AI itself.

Appendix: Code Walkthrough

The code begins by defining a helper function, random_stochastic_matrix(n=3, decimals=1), which generates a random stochastic matrix by first sampling a random positive matrix with np.random.rand(n, n), normalizing each row so that the entries sum to \(1\), rounding the entries for readability, and then correcting the final entry in each row to ensure exact row sums after rounding. A dictionary called examples stores a collection of representative Markov-chain transition matrices, including ergodic chains, periodic chains, absorbing chains, reducible chains, fully connected systems, transient-to-recurrent systems, and randomly generated chains. Each dictionary key is a descriptive label, while each value is a NumPy array representing the transition matrix. The program then initializes an empty dictionary called results and iterates through every example using for name, P in examples.items():. For each transition matrix \(P\), the code prints the matrix, calls classify_markov_chain(P) to compute structural properties of the chain, stores the returned properties in the results dictionary, and prints a formatted summary of key characteristics including irreducibility, aperiodicity, ergodicity, communicating classes, closed classes, absorbing states, recurrent states, transient states, and periods. If the classification routine determines that a stationary distribution exists, the code prints the stationary distribution rounded to four decimal places using np.round. Overall, the script acts as a compact framework for generating, analyzing, and comparing representative finite-state Markov chains.

References

Durrett, R. (2012). Essentials of Stochastic Processes (2nd ed.). Springer. https://doi.org/10.1007/978-1-4614-3615-7