Comparing Monthly Temperatures with a Difference-in-Means Test
Introduction
In this project, we’ll analyze historical weather data from Los Angeles and test whether temperature differences between months are statistically significant using a randomization test (also called a permutation test).
We’ll use:
-
Python
-
Open-Meteo weather API
-
Pandas + NumPy
-
Matplotlib
-
Statistical simulation
The Question
Do summer months in Los Angeles actually have significantly higher daily maximum temperatures than nearby months?
For example:
-
Is July hotter than August?
-
Is February cooler than March?
-
Could observed differences happen by random chance?
Instead of relying on assumptions from classical statistics, we’ll use simulation.
The core statistic in this analysis is the difference in means between two months’ daily maximum temperatures. If July temperatures are consistently warmer than August temperatures, the average of the July observations should exceed the average of the August observations. By repeatedly shuffling the temperature labels and recomputing the difference in means, we can estimate how likely such a difference would occur purely by chance.
Step 1 – Download Historical Weather Data
We’ll pull daily maximum temperatures from the Open-Meteo archive API.
import requests
import pandas as pd
latitude = 34.0522
longitude = -118.2437
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
"latitude": latitude,
"longitude": longitude,
"start_date": "2024-01-01",
"end_date": "2024-12-31",
"daily": "temperature_2m_max",
"timezone": "America/Los_Angeles"
}
response = requests.get(url, params=params)
data = response.json()
df = pd.DataFrame(data["daily"])
print(df.head())
This gives us:
| time | temperature_2m_max |
|---|---|
| 2024-01-01 | 18.1 |
| 2024-01-02 | 19.4 |
| … | … |
Step 2 – Prepare Monthly Groups
We convert the dates into month labels so temperatures can be grouped.
df["time"] = pd.to_datetime(df["time"])
df["month"] = df["time"].dt.month
Now we can isolate temperatures for specific months.
month1 = df[df["month"] == 7]["temperature_2m_max"].values
month2 = df[df["month"] == 8]["temperature_2m_max"].values
Here:
-
7= July -
8= August
Step 3 – Visualize Temperature Distributions
A boxplot helps us understand spread, variability, and outliers.
import matplotlib.pyplot as plt
monthly_temps = [
df[df["month"] == month]["temperature_2m_max"]
for month in range(1, 13)
]
plt.figure(figsize=(12, 6))
plt.boxplot(
monthly_temps,
tick_labels=[
"Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug",
"Sep", "Oct", "Nov", "Dec"
]
)
plt.xlabel("Month")
plt.ylabel("Maximum Temperature (°C)")
plt.title("Daily Maximum Temperatures by Month")
plt.grid(True)
plt.show()
The visualization immediately reveals:
-
Summer months shift upward
-
Winter months cluster lower
-
Some months have larger variability
# Step 4 – Compute the Observed Difference in Means
We calculate the difference between monthly means.
The observed statistic is:
\[\Delta = \bar{x}_1 - \bar{x}_2\]where:
-
\(\bar{x}_1\) = average temperature of Month 1
-
\(\bar{x}_2\) = average temperature of Month 2
In Python:
observed_diff = month1.mean() - month2.mean()
print(observed_diff)
Step 5 – The Null Hypothesis
Our null hypothesis says:
The two months come from the same temperature distribution.
If that’s true, then shuffling the temperature labels should not matter.
Formally:
\[H_0 : \mu_1 = \mu_2\]Step 6 – Randomization Test
We combine both groups together and repeatedly shuffle them.
import numpy as np
combined = np.concatenate([month1, month2])
num_simulations = 5000
simulated_diffs = []
for i in range(num_simulations):
np.random.shuffle(combined)
sim_mo1 = combined[:len(month1)]
sim_mo2 = combined[len(month1):]
sim_diff = sim_mo1.mean() - sim_mo2.mean()
simulated_diffs.append(sim_diff)
simulated_diffs = np.array(simulated_diffs)
This creates a simulated null distribution.
Why This Works
Under the null hypothesis,
\[H_0 : \mu_1 = \mu_2\]all temperature observations are interchangeable.
By shuffling labels thousands of times, we estimate what differences would occur purely by chance.
Step 7 – Compute the p-value
The p-value measures how extreme the observed difference is relative to the simulated distribution.
Formally:
\[p = P\left(|\Delta_{sim}| \geq |\Delta_{obs}|\right)\]Python implementation:
if observed_diff > 0:
p_value = np.mean(simulated_diffs >= observed_diff)
else:
p_value = np.mean(simulated_diffs <= observed_diff)
print(p_value)
Step 8 – Visualize the Simulation
bins = np.linspace(
simulated_diffs.min(),
simulated_diffs.max(),
30
)
plt.hist(simulated_diffs, bins=bins)
plt.axvline(
observed_diff,
linestyle="dashed"
)
plt.xlabel("Simulated Difference in Means")
plt.ylabel("Frequency")
plt.title("Randomization Test Distribution")
plt.show()
The histogram shows:
-
Most shuffled differences cluster near zero
-
Extreme values are rare
-
The observed statistic may sit far in the tail
In our simulations, a result of -0.76 or less occurred 568 times in 5000 samples (p-value = 0.1136). Since this result is not unusual when assuming July and August have the same average maximum temperature, there is not convincing evidence against the null hypothesis. The difference in means is not statistically significant.
Interpreting Results
Suppose we get:
p_value = 0.0032
This means:
Only 0.32% of shuffled simulations produced a difference as extreme as the real data.
Since:
\[p < 0.05\]we reject the null hypothesis and conclude the monthly temperatures differ significantly.