Open In Colab

Introduction to Markov Chains

A Markov chain is a mathematical model used to describe systems that transition between different states over time according to fixed probabilities.

The defining property of a Markov chain is the Markov property:

The future state depends only on the current state, not on the sequence of states that came before it.

Formally, if \(X_t\) denotes the state at time \(t\), then

\[P(X_{t+1}=j \mid X_t=i, X_{t-1}, \dots, X_0) = P(X_{t+1}=j \mid X_t=i)\]

This means that once the present state is known, the past provides no additional information about the future.

A Markov chain consists of:

  • A finite or countable set of states

  • Transition probabilities between states

  • An initial state distribution

The probability of moving from state \(i\) to state \(j\) is called the transition probability and is written as

\[p_{ij} = P(X_{t+1}=j \mid X_t=i)\]

These probabilities are organized into a transition matrix:

\[P = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ p_{21} & p_{22} & \cdots & p_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n1} & p_{n2} & \cdots & p_{nn} \end{bmatrix}\]

Each row of the matrix must sum to \(1\):

\[\sum_{j=1}^{n} p_{ij} = 1\]

since the system must transition to some state after each step.

Markov chains are widely used in probability theory, statistics, economics, computer science, and machine learning. Common applications include weather prediction, queueing systems, stock market modeling, PageRank, and hidden Markov models used in speech recognition and natural language processing.

Markov chains are one of the most elegant ideas in probability theory. At their core, they model systems that “move” between states according to fixed probabilities.

One of the most important equations in Markov chain theory is:

\[\pi = \pi P\]

This equation defines the stationary distribution of the chain.

In this article, we’ll break down:

  • What \(\pi = \pi P\) means
  • Why it matters
  • How to solve for \(\pi\)
  • A complete \(5 \times 5\) example
  • Full Gauss–Jordan row elimination using matrices

What Does \(\pi = \pi P\) Mean?

Suppose we have a Markov chain with transition matrix \(P\).

Each row of \(P\) tells us the probabilities of moving from one state to another.

If:

\[\pi = \begin{bmatrix} \pi_1 & \pi_2 & \pi_3 & \pi_4 & \pi_5 \end{bmatrix}\]

then \(\pi_i\) represents the long-run probability of being in state \(i\).

The equation

\[\pi = \pi P\]

means:

If the system already has distribution \(\pi\), then after one transition using \(P\), the distribution stays the same.

This is why \(\pi\) is called the stationary distribution.


A 5-State Markov Chain

Consider the transition matrix:

\[P = \begin{bmatrix} 0.2 & 0.3 & 0.1 & 0.2 & 0.2 \\ 0.1 & 0.4 & 0.2 & 0.2 & 0.1 \\ 0.3 & 0.2 & 0.2 & 0.1 & 0.2 \\ 0.25 & 0.15 & 0.25 & 0.2 & 0.15 \\ 0.2 & 0.2 & 0.2 & 0.1 & 0.3 \end{bmatrix}\]

Notice every row sums to \(1\), as required for a stochastic matrix.

We seek:

\[\pi = \begin{bmatrix} \pi_1 & \pi_2 & \pi_3 & \pi_4 & \pi_5 \end{bmatrix}\]

such that:

\[\pi = \pi P\]

and

\[\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1\]

Step 1: Rewrite the Equation

Starting from

\[\pi = \pi P\]

move everything to one side:

\[\pi(P - I) = 0\]

Equivalently:

\[(P^T - I)\pi^T = 0\]

We now compute \(P^T - I\).


Step 2: Compute \(P^T - I\)

First transpose \(P\):

\[P^T = \begin{bmatrix} 0.2 & 0.1 & 0.3 & 0.25 & 0.2 \\ 0.3 & 0.4 & 0.2 & 0.15 & 0.2 \\ 0.1 & 0.2 & 0.2 & 0.25 & 0.2 \\ 0.2 & 0.2 & 0.1 & 0.2 & 0.1 \\ 0.2 & 0.1 & 0.2 & 0.15 & 0.3 \end{bmatrix}\]

Subtract the identity matrix:

\[P^T - I = \begin{bmatrix} -0.8 & 0.1 & 0.3 & 0.25 & 0.2 \\ 0.3 & -0.6 & 0.2 & 0.15 & 0.2 \\ 0.1 & 0.2 & -0.8 & 0.25 & 0.2 \\ 0.2 & 0.2 & 0.1 & -0.8 & 0.1 \\ 0.2 & 0.1 & 0.2 & 0.15 & -0.7 \end{bmatrix}\]

We append the normalization equation:

\[\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1\]

and replace the final row with:

\[\begin{bmatrix} 1 & 1 & 1 & 1 & 1 \end{bmatrix}\]

giving the augmented matrix:

\[\left[ \begin{array}{ccccc|c} -0.8 & 0.1 & 0.3 & 0.25 & 0.2 & 0 \\ 0.3 & -0.6 & 0.2 & 0.15 & 0.2 & 0 \\ 0.1 & 0.2 & -0.8 & 0.25 & 0.2 & 0 \\ 0.2 & 0.2 & 0.1 & -0.8 & 0.1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 \end{array} \right]\]

Step 3: Begin Gauss–Jordan Elimination

Swap \(R_1\) and \(R_5\):

\[\left[ \begin{array}{ccccc|c} 1 & 1 & 1 & 1 & 1 & 1 \\ 0.3 & -0.6 & 0.2 & 0.15 & 0.2 & 0 \\ 0.1 & 0.2 & -0.8 & 0.25 & 0.2 & 0 \\ 0.2 & 0.2 & 0.1 & -0.8 & 0.1 & 0 \\ -0.8 & 0.1 & 0.3 & 0.25 & 0.2 & 0 \end{array} \right]\]

Eliminate the First Column

Perform:

\[\begin{aligned} R_2 &\to R_2 - 0.3R_1 \\ R_3 &\to R_3 - 0.1R_1 \\ R_4 &\to R_4 - 0.2R_1 \\ R_5 &\to R_5 + 0.8R_1 \end{aligned}\]

Result:

\[\left[ \begin{array}{ccccc|c} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & -0.9 & -0.1 & -0.15 & -0.1 & -0.3 \\ 0 & 0.1 & -0.9 & 0.15 & 0.1 & -0.1 \\ 0 & 0 & -0.1 & -1.0 & -0.1 & -0.2 \\ 0 & 0.9 & 1.1 & 1.05 & 1.0 & 0.8 \end{array} \right]\]

Normalize the Second Pivot

Divide \(R_2\) by \(-0.9\):

\[R_2 \to \frac{1}{-0.9}R_2\] \[\left[ \begin{array}{ccccc|c} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0.111 & 0.167 & 0.111 & 0.333 \\ 0 & 0.1 & -0.9 & 0.15 & 0.1 & -0.1 \\ 0 & 0 & -0.1 & -1.0 & -0.1 & -0.2 \\ 0 & 0.9 & 1.1 & 1.05 & 1.0 & 0.8 \end{array} \right]\]

Eliminate the Second Column

Perform:

\[\begin{aligned} R_1 &\to R_1 - R_2 \\ R_3 &\to R_3 - 0.1R_2 \\ R_5 &\to R_5 - 0.9R_2 \end{aligned}\]

Result:

\[\left[ \begin{array}{ccccc|c} 1 & 0 & 0.889 & 0.833 & 0.889 & 0.667 \\ 0 & 1 & 0.111 & 0.167 & 0.111 & 0.333 \\ 0 & 0 & -0.911 & 0.133 & 0.089 & -0.133 \\ 0 & 0 & -0.1 & -1.0 & -0.1 & -0.2 \\ 0 & 0 & 1.0 & 0.9 & 0.9 & 0.5 \end{array} \right]\]

Normalize the Third Pivot

Divide \(R_3\) by \(-0.911\):

\[R_3 \to \frac{1}{-0.911}R_3\]

Approximately:

\[\left[ \begin{array}{ccccc|c} 1 & 0 & 0.889 & 0.833 & 0.889 & 0.667 \\ 0 & 1 & 0.111 & 0.167 & 0.111 & 0.333 \\ 0 & 0 & 1 & -0.146 & -0.098 & 0.146 \\ 0 & 0 & -0.1 & -1.0 & -0.1 & -0.2 \\ 0 & 0 & 1.0 & 0.9 & 0.9 & 0.5 \end{array} \right]\]

Eliminate the Third Column

Perform:

\[\begin{aligned} R_1 &\to R_1 - 0.889R_3 \\ R_2 &\to R_2 - 0.111R_3 \\ R_4 &\to R_4 + 0.1R_3 \\ R_5 &\to R_5 - R_3 \end{aligned}\]

Result:

\[\left[ \begin{array}{ccccc|c} 1 & 0 & 0 & 0.963 & 0.976 & 0.537 \\ 0 & 1 & 0 & 0.183 & 0.122 & 0.317 \\ 0 & 0 & 1 & -0.146 & -0.098 & 0.146 \\ 0 & 0 & 0 & -1.015 & -0.110 & -0.185 \\ 0 & 0 & 0 & 1.046 & 0.998 & 0.354 \end{array} \right]\]

Continue Reduction

Normalize \(R_4\):

\[R_4 \to \frac{1}{-1.015}R_4\] \[\left[ \begin{array}{ccccc|c} 1 & 0 & 0 & 0.963 & 0.976 & 0.537 \\ 0 & 1 & 0 & 0.183 & 0.122 & 0.317 \\ 0 & 0 & 1 & -0.146 & -0.098 & 0.146 \\ 0 & 0 & 0 & 1 & 0.108 & 0.182 \\ 0 & 0 & 0 & 1.046 & 0.998 & 0.354 \end{array} \right]\]

Eliminate column 4:

\[R_5 \to R_5 - 1.046R_4\]

giving:

\[\left[ \begin{array}{ccccc|c} 1 & 0 & 0 & 0.963 & 0.976 & 0.537 \\ 0 & 1 & 0 & 0.183 & 0.122 & 0.317 \\ 0 & 0 & 1 & -0.146 & -0.098 & 0.146 \\ 0 & 0 & 0 & 1 & 0.108 & 0.182 \\ 0 & 0 & 0 & 0 & 0.885 & 0.164 \end{array} \right]\]

Solve for \(\pi_5\)

\[0.885\pi_5 = 0.164\]

so:

\[\pi_5 \approx 0.185\]

Then:

\[\pi_4 + 0.108(0.185) = 0.182\]

giving:

\[\pi_4 \approx 0.162\]

Next:

\[\pi_3 -0.146(0.162)-0.098(0.185)=0.146\]

so:

\[\pi_3 \approx 0.188\]

Then:

\[\pi_2 +0.183(0.162)+0.122(0.185)=0.317\]

giving:

\[\pi_2 \approx 0.265\]

Finally:

\[\pi_1 +0.963(0.162)+0.976(0.185)=0.537\]

yielding:

\[\pi_1 \approx 0.201\]

Final Stationary Distribution

Thus:

\[\pi \approx \begin{bmatrix} 0.201 & 0.265 & 0.188 & 0.162 & 0.185 \end{bmatrix}\]

Check that the entries sum to \(1\):

\[0.201 + 0.265 + 0.188 + 0.162 + 0.185 \approx 1\]

and indeed:

\[\pi P = \pi\]

Why the Stationary Distribution Matters

The stationary distribution tells us the long-run behavior of the Markov chain.

Even if the system starts somewhere else, after many transitions the probabilities converge toward \(\pi\).

Applications include:

  • Google PageRank
  • Queueing systems
  • Population dynamics
  • Weather models
  • Financial modeling
  • Reinforcement learning

The Big Idea

The equation

\[\pi = \pi P\]

captures equilibrium in a stochastic system.

The distribution reproduces itself after every transition.

Solving it ultimately becomes a linear algebra problem:

  • Form a homogeneous system
  • Add a normalization equation
  • Use Gauss–Jordan elimination
  • Interpret the resulting probability vector

This is one of the most beautiful intersections of probability theory and linear algebra.

Appendix: Solving the Stationary Distribution Using SymPy

This appendix shows how to compute the stationary distribution using Python’s SymPy symbolic mathematics library instead of performing Gauss–Jordan elimination by hand.


Why Use SymPy?

Manual elimination is excellent for understanding the mathematics, but for larger Markov chains it becomes tedious and error-prone.

SymPy allows us to:

  • Construct matrices symbolically
  • Solve linear systems exactly
  • Compute stationary distributions efficiently
  • Verify results automatically

Step 1: Import SymPy

import sympy as sp

Step 2: Define the Transition Matrix

We define the same transition matrix (P):

P = sp.Matrix([
    [0.2, 0.3, 0.1, 0.2, 0.2],
    [0.1, 0.4, 0.2, 0.2, 0.1],
    [0.3, 0.2, 0.2, 0.1, 0.2],
    [0.25, 0.15, 0.25, 0.2, 0.15],
    [0.2, 0.2, 0.2, 0.1, 0.3]
])

Step 3: Create the Unknown Variables

We define the stationary probabilities:

pi1, pi2, pi3, pi4, pi5 = sp.symbols(
    'pi1 pi2 pi3 pi4 pi5'
)

Form the vector:

pi = sp.Matrix([[pi1, pi2, pi3, pi4, pi5]])

Step 4: Write the Stationary Equation

We solve:

\[\pi = \pi P\]

which becomes:

equations = list(pi * P - pi)

This produces the homogeneous system:

(piP - pi) = 0

Step 5: Add the Normalization Equation

A probability distribution must satisfy:

\[\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1\]

Add this equation:

equations.append(
    pi1 + pi2 + pi3 + pi4 + pi5 - 1
)

Step 6: Solve the System

Now solve all equations simultaneously:

solution = sp.solve(
    equations,
    [pi1, pi2, pi3, pi4, pi5]
)

print(solution)

Complete Program

import sympy as sp

# Transition matrix
P = sp.Matrix([
    [0.2, 0.3, 0.1, 0.2, 0.2],
    [0.1, 0.4, 0.2, 0.2, 0.1],
    [0.3, 0.2, 0.2, 0.1, 0.2],
    [0.25, 0.15, 0.25, 0.2, 0.15],
    [0.2, 0.2, 0.2, 0.1, 0.3]
])

# Unknown stationary probabilities
pi1, pi2, pi3, pi4, pi5 = sp.symbols(
    'pi1 pi2 pi3 pi4 pi5'
)

# Row vector
pi = sp.Matrix([[pi1, pi2, pi3, pi4, pi5]])

# Stationary equations
equations = list(pi * P - pi)

# Normalization condition
equations.append(
    pi1 + pi2 + pi3 + pi4 + pi5 - 1
)

# Solve system
solution = sp.solve(
    equations,
    [pi1, pi2, pi3, pi4, pi5]
)

print(solution)

Expected Output

SymPy returns values close to:

{
 pi1: 0.201,
 pi2: 0.265,
 pi3: 0.188,
 pi4: 0.162,
 pi5: 0.185
}

Thus:

\[\pi \approx \begin{bmatrix} 0.201 & 0.265 & 0.188 & 0.162 & 0.185 \end{bmatrix}\]

which matches the result obtained through Gauss–Jordan elimination.


Verifying the Result

We can also verify that:

\[\pi P = \pi\]

using:

pi_numeric = sp.Matrix([[
    solution[pi1],
    solution[pi2],
    solution[pi3],
    solution[pi4],
    solution[pi5]
]])

print(pi_numeric * P)

The output equals the original vector \(\pi\), confirming the stationary distribution.


Alternative Method: Eigenvectors

Another elegant approach uses eigenvectors.

The stationary distribution satisfies:

\[P^T\pi^T = \pi^T\]

meaning \(\pi^T\) is an eigenvector of \(P^T\) with eigenvalue \(1\).

In SymPy:

eigen_data = P.T.eigenvects()

print(eigen_data)

You would then extract the eigenvector associated with eigenvalue \(1\) and normalize it so its entries sum to \(1\).


Final Remarks

Using SymPy transforms the stationary distribution problem into a compact symbolic computation workflow:

  1. Define the transition matrix
  2. Construct the equations
  3. Add normalization
  4. Solve symbolically
  5. Verify the result

For large Markov chains, symbolic and numerical tools become essential, and SymPy provides a clean bridge between probability theory and computational linear algebra.