Data Envelopment Analysis (DEA) in Python
Introduction
In this blog post, we’ll explore Data Envelopment Analysis (DEA) using Python. DEA is a powerful method for assessing the relative efficiency of decision-making units. We’ll use the Gurobi optimization library for our analysis.
Getting Started
Let’s start by importing the necessary libraries and downloading the dataset.
import gdown
gdown.download('https://raw.githubusercontent.com/jordan-hay/jordan-hay.github.io/main/docs/assets/dea_data.csv','file.csv',quiet=True)
!pip install gurobipy &>/dev/null
import pandas as pd
import gurobipy as gp
from gurobipy import GRB
import numpy as np
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('file.csv')
df.head()
Data Exploration
Now, let’s explore the dataset and understand its structure.
dfcols = list(df.columns)
input_df = df[dfcols[1:6]]
output_df = df[dfcols[6:10]]
# Display the first few rows of input and output data
input_df.head()
output_df.head()
DEA Linear Program
In DEA linear programming, a set of key components form the foundation of the model. These components include decision variables, an objective function, and a set of constraints.
-
Decision Variables: The decision variables \(u_i\) and \(v_j\) play a crucial role in determining the weights assigned to each output and input, respectively. Here, \(u_i\) represents the weight for output \(i\), where \(i\) ranges from 1 to the total number of outputs (\(\text{n_outputs}\)). Similarly, \(v_j\) signifies the weight for input \(j\), spanning from 1 to the total number of inputs (\(\text{n_inputs}\)). These variables encapsulate the essence of the model, as they are optimized to unveil the most efficient distribution of weights.
-
Objective Function: The objective of the model is encapsulated in the maximization task defined by the objective function:
This function aims to maximize the sum of the weighted outputs (\(O_i\)) for a specific unit, denoted as \(e\). The weights (\(u_{i}\)) assigned to each output contribute to the overall efficiency score, emphasizing the optimization of the chosen unit’s performance.
- Constraints: The model operates within the bounds of a set of constraints, reflecting the inherent limitations and requirements. The constraints are defined as follows:
The first constraint ensures that the weighted sum of outputs for each unit (\(k\)) does not exceed the weighted sum of inputs. The second constraint guarantees that the total weighted sum of inputs for a specific unit (\(e\)) is equal to 1. These constraints form the operational boundaries of the model, guiding the optimization process toward realistic and meaningful solutions.
In summary, these components collectively shape the linear programming model, providing a structured framework for assessing and maximizing the efficiency of decision-making units in diverse applications.
DEA Analysis
We’ll perform DEA analysis on the dataset using Gurobi.
# Convert dataframes to numpy arrays
inputs = input_df.values
outputs = output_df.values
# Number of decision-making units and variables
n_units, n_inputs = inputs.shape
n_outputs = outputs.shape[1]
storage=[]
for unit in range(n_units):
# Create a model
model = gp.Model("DEA")
model.setParam('OutputFlag', 0)
# Add decision variables
u = model.addVars(n_outputs, vtype=GRB.CONTINUOUS, name="u",lb=0.0)
v = model.addVars(n_inputs, vtype=GRB.CONTINUOUS, name="v",lb = 0.0)
# Add objective function (to be minimized)
model.setObjective(gp.quicksum(x*y for x,y in zip(outputs[unit],u.values()) ), GRB.MAXIMIZE)
# Add constraints
for input,output in zip(inputs,outputs):
model.addConstr(gp.quicksum(x*y for x,y in zip(output,u.values()) ) <= gp.quicksum(x*y for x,y in zip(input,v.values()) ))
model.addConstr(gp.quicksum(x*y for x,y in zip(inputs[unit],v.values()) ) == 1)
# Solve the model
model.optimize()
if model.status==2:
print('OPTIMAL')
else:
print('*****')
sp=[constr.Pi for constr in model.getConstrs()][0:-1]
efficientin=[]
for i in range(n_inputs):
efficientin.append(np.dot(sp,inputs[:,i]))
efficientout=[]
for i in range(n_outputs):
efficientout.append(np.dot(sp,outputs[:,i]))
for i in range(n_inputs):
imp = np.round(np.abs((np.dot(sp,inputs[:,i])-inputs[unit,i])/inputs[unit,i])*100,1)
storage.append((unit,input_cols[i],'input',imp))
for i in range(n_outputs):
imp = np.round(np.abs((np.dot(sp,outputs[:,i])-outputs[unit,i])/outputs[unit,i])*100,1)
storage.append((unit,output_cols[i],'output',imp))
Results
Let’s visualize the improvement opportunities for input and output attributes.
# Display improvement opportunities for output attributes
pvdf_output = pd.pivot_table(data=df_final[df_final.Type=='output'],index=['Unit'],columns=['Attribute'],values=['Improvement Opportunity'],aggfunc=np.sum)
pvdf_output.columns = pvdf_output.columns.droplevel()
# Visualize improvement opportunities for output attributes
import seaborn as sns
sns.heatmap(pvdf_output.values, xticklabels=output_cols, cmap='viridis')
plt.title("Improvement Opportunities - Output Attributes")
plt.ylabel('Unit')
plt.show()
# Display improvement opportunities for input attributes
pvdf_input = pd.pivot_table(data=df_final[df_final.Type=='input'],index=['Unit'],columns=['Attribute'],values=['Improvement Opportunity'],aggfunc=np.sum)
pvdf_input.columns = pvdf_input.columns.droplevel()
# Visualize improvement opportunities for input attributes
import seaborn as sns
sns.heatmap(pvdf_input.values, xticklabels=input_cols, cmap='viridis')
plt.title("Improvement Opportunities - Input Attributes")
plt.ylabel('Unit')
plt.show()
We use heatmaps to visualize the improvement opportunities for both input and output attributes across different units.
Conclusion
In this blog post, we’ve covered the implementation of DEA analysis using Python and Gurobi. The provided code provides insights into the efficiency of decision-making units.