Installation and Quickstart Example

Table of contents

Installation and Quickstart Example

Dependencies

This package requires prior installation of

Python (>= 3.0)
NumPy (>= 1.17.5)
Scikit-Learn (>= 0.22.1))
Pandas (todo: check)
Matplotlib
Seaborn

If your computer system does not have python 3.*, install from here.

If your python version does not have the Pandas, Scikit learn, or Numpy packages, install from here

Installation

The MALTS Python Package is available for download on the almost-matching-exactly Github or via PyPi (recommended):

pip install pymalts2

Quickstart Example

We show the working of the package. In this example, we provide only the basic inputs: (1) input data as a dataframe or file, (2) the name of the outcome column, and (3) the name of the treatment column. In order to set up the model for learning the distance metric, we consider:

Variable name for the outcome variable: ‘outcome’.
Variable name for the treatment variable: ‘treated’
Data is assigned to python variable df

import pandas as pd
import pymalts2 as pymalts
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('example/example_data.csv',index_col=0)
print(df.shape)
df.head()

#> (2500, 20)
#>	    X1	        X2	        X3 	        X4	        X5	        X6	        X7	        X8	        X9	        X10  	           X11	            X12 	      X13	      X14	      X15	       X16	      X17	     X18	   outcome   	  treated
#>1355	1.881335	1.684164	0.532332	2.002254	1.435032	1.450196	1.974763	1.321659	0.709443	-1.141244	0.883130	0.956721	2.498229	2.251677	0.375271	-0.545129	3.334220	0.081259	-15.679894	0
#>1320	0.666476	1.263065	0.657558	0.498780	1.096135	1.002569	0.881916	0.740392	2.780857	-0.765889	1.230980	-1.214324	-0.040029	1.554477	4.235513	3.596213	0.959022	0.513409	-7.068587	0
#>1233	-0.193200	0.961823	1.652723	1.117316	0.590318	0.566765	0.775715	0.938379	-2.055124	1.942873	-0.606074	3.329552	-1.822938	3.240945	2.106121	0.857190	0.577264	-2.370578	-5.133200	0
#>706	1.378660	1.794625	0.701158	1.815518	1.129920	1.188477	0.845063	1.217270	5.847379	0.566517	-0.045607	0.736230	0.941677	0.835420	-0.560388	0.427255	2.239003	-0.632832	39.684984	1
#>438	0.434297	0.296656	0.545785	0.110366	0.151758	-0.257326	0.601965	0.499884	-0.973684	-0.552586	-0.778477	0.936956	0.831105	2.060040	3.153799	0.027665	0.376857	-1.221457	-2.954324	0

m = pymalts.malts_mf( outcome='outcome', treatment='treated', data=df) # running MALTS with default setting

Matched Groups

Matched Group matrix (MG_matrix) is NxN matrix with each row corresponding to each query unit and each column corresponds to matched units. Cell (i,j) in the matrix corresponds to the weight of unit j in the matched group of unit i. The weight corresponds to the numbers of times a unit is included in a matched group across M-folds.

The CATE_df dataframe in the model m gives us the CATE estimate for a corresponding unit in each row.

print (m.CATE_df)
#>	avg.CATE	std.CATE	outcome	   treated
#>0	47.232061	21.808950	-15.313091	0.0
#>1	40.600643	21.958906	-16.963202	0.0
#>2	40.877320	22.204570	9.527929	1.0
#>3	37.768578	19.740320	-3.940218	0.0
#>4	39.920257	21.744433	-8.011915	0.0
#>...   	...    	...	    ...    	...
#>2495	49.227788	21.581176	-14.529871	0.0
#>2496	42.352355	21.385861	19.570055	1.0
#>2497	43.737763	19.859275	-16.342666	0.0
#>2498	41.189297	20.346711	-9.165242	0.0
#>2499	45.427037	23.762884	-17.604829	0.0

ATE = m.CATE_df['avg.CATE'].mean()
print (ATE)
#>42.29673993471417