Introduction

In this walkthrough, we demonstrate how to move time series data from R to Python, combining the strengths of both environments inside a single notebook. R gives us access to specialized datasets, while Python makes analysis and visualization easy.

Setting Up the R Environment

We enable R support inside the Google Colab notebook so that R code can run directly without leaving Python.

%load_ext rpy2.ipython

We install and load the tsdl package from GitHub, which provides a large collection of real-world time series datasets.

%%R
devtools::install_github("FinYang/tsdl")
library(tsdl)
print(tsdl)

Selecting and Preparing Data in R

We subset the dataset by specifying a subject area and frequency, allowing us to focus on monthly agricultural time series.

%%R
subject <- "Agriculture"
frequency <- 12
example <- 2
ts <- subset(tsdl, frequency, subject)
ts <- ts[[example]]

We extract the year and month components from the time index. This reshaping will make it easier to organize and analyze the data later.

%%R
time_points <- time(ts)
years <- floor(time_points)
months <- round(12 * (time_points - years) + 1)

We inspect the attributes and structure of the selected time series, then save it to a CSV file for clean transfer into Python.

%%R
print(attributes(ts))
print(ts)
plot(ts)
write.csv(ts, "data.csv", row.names = FALSE)

Accessing the Data from Python

We import the necessary Python libraries and ensure the R tsdl package is installed and loaded from Python’s side.

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr

# Install and import tsdl package
ro.r('suppressWarnings(suppressMessages(devtools::install_github("FinYang/tsdl")))')
tsdl = importr('tsdl')

We send R code from Python to subset, reshape, and export the dataset into a CSV in a single step, keeping the workflow clean.

# Set parameters and export
subject = "Agriculture"
frequency = 12
example = 2

ro.r(f"""
ts_data <- subset(tsdl, {frequency}, "{subject}")[[{example}]]
time_points <- time(ts_data)
years <- floor(time_points)
periods <- round(12 * (time_points - years) + 1)
df <- data.frame(Value = as.numeric(ts_data), Year = years, Period = periods)
write.csv(df, "data.csv", row.names = FALSE)
""")

Loading and Inspecting Data in Python

We load the CSV file generated from R into a pandas DataFrame for inspection and further processing.

df = pd.read_csv('data.csv')
display(df.head())

Reshaping and Visualizing the Data

We pivot the DataFrame to organize values by year and month, which prepares it for easier analysis and visualization.

pivot_df = pd.pivot_table(df, values='Value', index='Year', columns='Period')
display(pivot_df)

We plot the time series to get a first visual sense of the data.

df.plot(y='Value', ylabel='ts', xlabel='Time');

Conclusion

By integrating R and Python, we create a streamlined workflow that leverages the best of both languages. R is used to quickly access and manipulate time series data, while Python provides robust tools for data handling, analysis, and visualization. Using rpy2 to run R code from within Python allows for seamless collaboration between the two environments without the need for constant switching between them. This approach enhances efficiency, flexibility, and ensures you can use the most appropriate tools for each step of the analysis process.