Bayesian Analysis with SymPy
Introduction
This post develops a complete Bayesian analysis for an exponential likelihood with an unknown rate parameter constrained to the unit interval. The prior distribution is non-conjugate but analytically tractable, allowing closed-form derivations of the joint, marginal, and posterior densities. Symbolic computation is used to derive exact expressions, followed by numerical evaluation and visualization for single observations, multiple observations, and the general i.i.d. case.
Mathematical Background
Let \(\Theta\) be a random variable taking values in the interval \(0 < \theta < 1\). The prior density is defined as
\[f_\Theta(\theta) = 2(1 - \theta)\]which is a valid probability density on \([0,1]\).
Conditional on \(\Theta = \theta\), the observation \(X\) follows an exponential distribution with rate \(\theta\):
\[f_{X \mid \Theta}(x \mid \theta) = \theta e^{-\theta x}, \quad x > 0\]The joint density of \(X\) and \(\Theta\) is therefore
\[f_{X,\Theta}(x,\theta) = f_{X \mid \Theta}(x \mid \theta) f_\Theta(\theta)\]The marginal density of \(X\) is obtained by integrating out \(\theta\) over \([0,1]\):
\[f_X(x) = \int_0^1 f_{X,\Theta}(x,\theta)\, d\theta\]Finally, the posterior density is given by Bayes’ theorem:
\[f_{\Theta \mid X}(\theta \mid x) = \frac{f_{X,\Theta}(x,\theta)}{f_X(x)}\]This structure generalizes naturally to multiple i.i.d. observations \(X_1,\dots,X_n\).
Symbolic Derivation and Single-Observation Analysis
import sympy as sp
from IPython.display import display, Math
import numpy as np
import matplotlib.pyplot as plt
# Define symbols
theta, x = sp.symbols('theta x', positive=True)
# Step 1: Prior PDF
prior = 2*(1 - theta) # valid for 0 < theta < 1
# Step 2: Conditional PDF
conditional = theta * sp.exp(-theta * x) # x > 0
# Step 3: Joint PDF
f_joint = conditional * prior
f_joint_simplified = sp.simplify(f_joint)
# Step 4: Marginal PDF f_X(x)
f_marginal_x = sp.integrate(f_joint_simplified, (theta, 0, 1))
f_marginal_x_simplified = sp.simplify(f_marginal_x)
# Step 5: Posterior PDF f_{Theta|X}(theta|x)
f_posterior = f_joint_simplified / f_marginal_x_simplified
f_posterior_simplified = sp.simplify(f_posterior)
# Display results
print("1. Prior PDF f_Θ(θ):")
display(Math(sp.latex(prior)))
print("2. Conditional PDF f_{X|Θ}(x|θ):")
display(Math(sp.latex(conditional)))
print("3. Joint PDF f_{X,Θ}(x,θ):")
display(Math(sp.latex(f_joint_simplified)))
print("4. Marginal PDF f_X(x):")
display(Math(sp.latex(f_marginal_x_simplified)))
print("5. Posterior PDF f_{Θ|X}(θ|x):")
display(Math(sp.latex(f_posterior_simplified)))
The symbolic integration produces an exact marginal density for \(X\), ensuring that the posterior is properly normalized. The posterior density is no longer of the same functional form as the prior, illustrating the non-conjugacy of the model.
Numerical Evaluation and Posterior Visualization
# Convert symbolic → numerical functions
prior_fn = sp.lambdify(theta, prior, "numpy")
posterior_x1_fn = sp.lambdify(theta, f_posterior.subs(x, 1), "numpy")
posterior_x3_fn = sp.lambdify(theta, f_posterior.subs(x, 3), "numpy")
# Theta range
theta_vals = np.linspace(0, 1, 400)
# Plotting
plt.figure(figsize=(8,5))
plt.plot(theta_vals, posterior_x1_fn(theta_vals), label="Posterior $x=1$")
plt.plot(theta_vals, posterior_x3_fn(theta_vals), label="Posterior $x=3$")
plt.plot(theta_vals, prior_fn(theta_vals), label="Prior $2\\theta$")
plt.title("Prior and Posterior Distributions for $\\Theta$")
plt.xlabel("$\\theta$")
plt.ylabel("Density")
plt.legend()
plt.grid(True)
plt.show()
As expected, larger observed values of \(x\) shift posterior mass toward smaller values of \(\theta\), reflecting the inverse relationship between the exponential rate and typical observation magnitude.
Extension to Two Observations
Consider two independent observations \(X_1\) and \(X_2\), conditionally i.i.d. given \(\Theta = \theta\). By conditional independence, the joint conditional density is the product of the individual exponential likelihoods:
\[f_{X_1,X_2 \mid \Theta}(x_1,x_2 \mid \theta) = \theta^2 e^{-\theta(x_1 + x_2)}, \quad x_1,x_2 > 0.\]Multiplying by the prior density \(f_\Theta(\theta) = 2(1-\theta)\) yields the joint density of \(X_1\), \(X_2\), and \(\Theta\):
\[f_{X_1,X_2,\Theta}(x_1,x_2,\theta) = 2\,\theta^2 (1-\theta)\, e^{-\theta(x_1+x_2)}, \quad 0 < \theta < 1.\]The marginal density of the observations is obtained by integrating out \(\theta\) over the unit interval:
\[f_{X_1,X_2}(x_1,x_2) = \int_0^1 2\,\theta^2 (1-\theta)\, e^{-\theta(x_1+x_2)} \, d\theta.\]This integral admits a closed-form expression, ensuring that the posterior density is properly normalized. Applying Bayes’ theorem, the posterior distribution of \(\Theta\) given both observations is
\[f_{\Theta \mid X_1,X_2}(\theta \mid x_1,x_2) = \frac{2\,\theta^2 (1-\theta)\, e^{-\theta(x_1+x_2)}} {\displaystyle \int_0^1 2\,t^2 (1-t)\, e^{-t(x_1+x_2)} \, dt}, \quad 0 < \theta < 1.\]The posterior depends on the data only through the sufficient statistic \(x_1 + x_2\), reflecting the additive structure of the exponential likelihood. Relative to the single-observation case, the likelihood term \(\theta^2\) increases posterior concentration, while the exponential factor shifts mass toward smaller values of \(\theta\) as the total observed waiting time increases. This same structure extends directly to the general i.i.d. case.
# Define new symbols for two observations
x1, x2 = sp.symbols('x1 x2', positive=True)
# Step 1: Joint conditional PDF for X1, X2 given Theta (i.i.d.)
conditional_x1 = conditional.subs(x, x1)
conditional_x2 = conditional.subs(x, x2)
joint_conditional_2 = conditional_x1 * conditional_x2
# Step 2: Joint PDF with prior
f_joint_2 = joint_conditional_2 * prior
f_joint_2_simplified = sp.simplify(f_joint_2)
# Step 3: Marginal PDF f_{X1,X2}(x1,x2)
f_marginal_2 = sp.integrate(f_joint_2_simplified, (theta, 0, 1))
f_marginal_2_simplified = sp.simplify(f_marginal_2)
# Step 4: Posterior PDF f_{Theta|X1,X2}(theta|x1,x2)
f_posterior_2 = f_joint_2_simplified / f_marginal_2_simplified
f_posterior_2_simplified = sp.simplify(f_posterior_2)
# Display results
print("1. Joint conditional PDF f_{X1,X2|Θ}(x1,x2|θ):")
display(Math(sp.latex(joint_conditional_2)))
print("2. Joint PDF f_{X1,X2,Θ}(x1,x2,θ):")
display(Math(sp.latex(f_joint_2_simplified)))
print("3. Marginal PDF f_{X1,X2}(x1,x2):")
display(Math(sp.latex(f_marginal_2_simplified)))
print("4. Posterior PDF f_{Θ|X1,X2}(θ|x1,x2):")
display(Math(sp.latex(f_posterior_2_simplified)))
For independent observations, the likelihood contributions multiply, and the posterior depends on the sum \(x_1 + x_2\). This additive structure persists for arbitrary sample size.
Generalization to \(n\) Observations
# --- Generalization to n observations ---
n = 3 # change n to any number of observations
x_symbols = sp.symbols(f'x1:{n+1}', positive=True) # creates x1, x2, ..., xn
# Step 1: Joint conditional PDF for n i.i.d. observations
joint_conditional_n = sp.prod(conditional.subs(x, xi) for xi in x_symbols)
# Step 2: Joint PDF with prior
f_joint_n = joint_conditional_n * prior
f_joint_n_simplified = sp.simplify(f_joint_n)
# Step 3: Marginal PDF f_{X1,...,Xn}(x1,...,xn)
f_marginal_n = sp.integrate(f_joint_n_simplified, (theta, 0, 1))
f_marginal_n_simplified = sp.simplify(f_marginal_n)
# Step 4: Posterior PDF f_{Theta|X1,...,Xn}(theta|x1,...,xn)
f_posterior_n = f_joint_n_simplified / f_marginal_n_simplified
f_posterior_n_simplified = sp.simplify(f_posterior_n)
For \(n\) observations, the joint conditional density takes the form
\[f_{X_1,\dots,X_n \mid \Theta}(\mathbf{x} \mid \theta) = \theta^n \exp!\left(-\theta \sum_{i=1}^n x_i\right)\]and the posterior density is proportional to this expression multiplied by the prior \(2(1-\theta)\).
Asymptotic Behavior and Numerical Normalization
# Symbols
theta = sp.symbols('theta', positive=True)
# Your conditional and prior
# conditional = ...
# prior = ...
fixed_x_value = 3
n_values = [1, 2, 3, 5, 10]
theta_vals = np.linspace(0, 1, 400)
# Precompute conditional at x=3
single_val = conditional.subs(x, fixed_x_value)
plt.figure(figsize=(8,5))
for n in n_values:
# Joint conditional (fast!)
joint_conditional_n = single_val**n
# Posterior numerator
posterior_num = joint_conditional_n * prior
# Lambdify once
posterior_num_fn = sp.lambdify(theta, posterior_num, "numpy")
# Numerical marginal integral (fast)
marginal_val = np.trapz(posterior_num_fn(theta_vals), theta_vals)
# Normalized posterior
posterior_fn = lambda t: posterior_num_fn(t) / marginal_val
# Plot
plt.plot(theta_vals, posterior_fn(theta_vals), label=f"n={n}")
# Plot prior
prior_fn = sp.lambdify(theta, prior, "numpy")
plt.plot(theta_vals, prior_fn(theta_vals), linestyle="--", label="Prior")
plt.xlabel(r'$\theta$')
plt.ylabel("Density")
plt.title("Posterior for Different n with All Observations = 3")
plt.legend()
plt.grid(True)
plt.show()
As \(n\) increases, the posterior becomes increasingly concentrated, demonstrating Bayesian consistency. Even with a non-conjugate prior, the likelihood dominates asymptotically, and the posterior mass concentrates near the value of \(\theta\) that best explains the observed data.