ERPs in Python

This tutorial is meant to be a companion to my R equivalent. See the same script in R here.

The following script will take data files output by the COGNISION(tm) EEG system developed by Neuronetrix. The script find the grand average, and plot the resulting event-related potentials (ERPs). The Cognision system is a 7-electrode system used for both clinical and research purposes. In this example, we take the grand average of all the midline electrodes (FZ, CZ, PZ) from five individuals.

First, import all the important libraries you need. Specifically, pandas, numpy, glob2 (to get file names), os (to set working directory), and matplotlib.

%matplotlib inline
import pandas as pd
import numpy as np
import glob2
import os
import matplotlib.pyplot as plt

Next, set the working directory (just so it's easier to import files).

os.chdir('/media/mihcelle/ChelleUSB/Cognision')

We need to find all the files in the directory. We do this using the glob library in python. It will search through all files in a directory matching some string input (in this case, .xlsx).

files = glob2.glob('*.xlsx')
files

['2490_1572_average.xlsx',
 '2504_2098_average.xlsx',
 '2514_1748_average.xlsx',
 '2536_1455_average.xlsx',
 '3097_1981_average.xlsx']

R has a list object to store multiple items (like dataframes). Python, in turn, uses dicts to achieve something similar. In the line below, I'm telling python to read in the excel files in the folder. The keys are the file names and the values are the dataframes themselves. The dict object is labeled x.

Just to check, I print out the first ten lines of the first data frame in the x dict.

x = {file: pd.read_excel(file, sheetname = 2) for file in files}
x.values()[1].head(10)

	Time	FZ	CZ	PZ	F3	P3	F4	P4	Stimulus
0	-240	0.913900	0.251298	0.125067	0.603175	-0.285071	1.029497	1.770241	1
1	-232	1.150180	0.523645	0.429780	0.901415	0.111195	1.087295	1.880752	1
2	-224	1.969993	1.346232	1.329586	1.810007	1.216764	1.929303	2.901241	1
3	-216	2.290890	1.546908	1.630600	2.060621	1.399869	2.166045	3.213815	1
4	-208	1.505756	0.776108	0.904190	1.318026	0.368282	0.954127	2.228467	1
5	-200	0.902340	0.260546	0.458910	1.014700	0.036751	0.130615	1.590835	1
6	-192	1.204742	0.694728	1.002678	1.677764	0.818648	0.825584	2.166508	1
7	-184	1.381836	1.091919	1.400794	1.988488	1.361028	1.371201	2.532719	1
8	-176	0.717847	0.470008	0.866274	1.297681	0.790442	0.538441	1.737874	1
9	-168	0.095936	-0.156065	0.465846	0.876909	0.382616	-0.442745	1.268089	1

The names of each key is a bit much. Let's do a for loop to change the names of each key into something manageable.

Basically, for the length of x, assign the ith key name to the variable name, convert i to a string, print i to check it's working, then remove the old key name and replace it with the new key name.

for i in range(len(x)):
    name = x.keys()[i]
    i = str(i)
    print i
    x[i] = x.pop(name)

Ultimately, we want to populate a new data frame with all the values in our x dict. First we create an empty dictionary named EEG_data.

EEG_data = pd.DataFrame()

In the following for loop, for the ith item in the range of length(x), convert i into a string, then append x[i] to the empty EEG_data data frame. Loop until there are no more items to go through in x.

for i in range(len(x)):
    i = str(i)
    EEG_data = EEG_data.append(x[i])

From here, we need to gather the grand mean for all elements. Or in other words, find the mean for each individual cell across ALL subjects.

This is achieved by simply aggregating EEG_data by it's index, and finding the mean.

We end up with the grand means when calling means = row_index.mean()

row_index = EEG_data.groupby(EEG_data.index)
means = row_index.mean()
means.head(10)

	Time	FZ	CZ	PZ	F3	P3	F4	P4	Stimulus
0	-240	-0.039821	0.349694	0.668002	0.007620	0.441710	-0.047589	0.558231	1
1	-232	0.277747	0.723581	0.991673	0.498491	0.835016	0.218099	0.802094	1
2	-224	0.469915	0.984368	1.188373	0.690752	1.095063	0.325650	0.928789	1
3	-216	0.316125	0.895682	1.082024	0.389830	0.843432	0.118593	0.787853	1
4	-208	0.081047	0.730794	0.900028	0.180923	0.498306	-0.141546	0.632768	1
5	-200	0.061627	0.777218	0.964207	0.328610	0.639889	-0.214696	0.630456	1
6	-192	0.149111	0.913437	1.100982	0.558971	0.908166	-0.123975	0.630086	1
7	-184	0.145874	0.874134	1.043646	0.679654	0.844264	-0.043150	0.456044	1
8	-176	0.034624	0.699074	0.890780	0.581905	0.571825	-0.101226	0.276082	1
9	-168	-0.180387	0.557769	0.865904	0.340539	0.493220	-0.281557	0.320194	1

In the next three parts, we are simply subsetting the previous data frame into separate dataframes, corresponding to the Stimulus number (1, 2, or 3).

means_stim_1 = means[(means.Stimulus == 1)]
means_stim_1.head(5)

	Time	FZ	CZ	PZ	F3	P3	F4	P4	Stimulus
0	-240	-0.039821	0.349694	0.668002	0.007620	0.441710	-0.047589	0.558231	1
1	-232	0.277747	0.723581	0.991673	0.498491	0.835016	0.218099	0.802094	1
2	-224	0.469915	0.984368	1.188373	0.690752	1.095063	0.325650	0.928789	1
3	-216	0.316125	0.895682	1.082024	0.389830	0.843432	0.118593	0.787853	1
4	-208	0.081047	0.730794	0.900028	0.180923	0.498306	-0.141546	0.632768	1

means_stim_2 = means[(means.Stimulus == 2)]
means_stim_2.head(5)

	Time	FZ	CZ	PZ	F3	P3	F4	P4	Stimulus
150	-240	-0.135720	-0.526437	0.112582	-0.702145	0.118593	-0.221724	0.562485	2
151	-232	0.110270	-0.256403	0.246212	-0.421475	0.218931	0.206909	0.585605	2
152	-224	-0.489909	-0.839936	-0.391883	-0.986513	-0.307266	-0.500081	-0.251317	2
153	-216	-0.753007	-1.042462	-0.653594	-1.142338	-0.544008	-1.054946	-0.497307	2
154	-208	-0.342407	-0.693822	-0.415002	-0.573601	-0.163926	-0.640647	-0.268425	2

means_stim_3 = means[(means.Stimulus == 3)]
means_stim_3.head(5)

	Time	FZ	CZ	PZ	F3	P3	F4	P4	Stimulus
300	-240	-0.905595	-0.623308	-0.356972	-0.712086	-0.694053	-1.458380	-0.705150	3
301	-232	-0.927790	-0.282066	0.007158	-0.909757	0.080677	-1.409135	-0.542159	3
302	-224	-1.280823	-0.689198	-0.298018	-1.425088	-0.194674	-1.459767	-0.910450	3
303	-216	-1.543690	-0.970792	-0.547014	-1.467396	-0.571289	-1.837769	-1.075523	3
304	-208	-1.391796	-0.524126	0.229104	-1.147655	0.172924	-1.834994	-0.420088	3

Now we can plot. We use both the pandas wrapper for matplotlib, as well as matplotlib itself.

The first line specifies the subplot (to overlay lines later on). The second line sets the axis labels. The third through fifth lines specify each data to overlay on the plots (specifically for Stims 1,2 and 3).

Note that this first plot is for the electrode 'FZ' (the electrode at the front of the scalp).

fig, ax = plt.subplots()
ax.set(xlabel="Time (in milliseconds)", ylabel="uV (microvolts)")
means_stim_1.plot(x = 'Time', y = 'FZ', kind = 'line', ylim = (-15,15), xlim = (-200,1000), ax=ax, label = 'Stim 1', figsize = (7,5), title = 'FZ')
means_stim_2.plot(x = 'Time', y = 'FZ',ax=ax, label = 'Stim 2')
means_stim_3.plot(x = 'Time', y = 'FZ',ax=ax, label = 'Stim 3')

<matplotlib.axes._subplots.AxesSubplot at 0x7f221f32f990>

We can do the same for CZ (the electrode at the center of the scalp).

fig, ax = plt.subplots()
ax.set(xlabel="Time (in milliseconds)", ylabel="uV (microvolts)")
means_stim_1.plot(x = 'Time', y = 'CZ', kind = 'line', ylim = (-15,15), xlim = (-200,1000), ax=ax, label = 'Stim 1', figsize = (7,5), title = 'CZ')
means_stim_2.plot(x = 'Time', y = 'CZ',ax=ax, label = 'Stim 2')
means_stim_3.plot(x = 'Time', y = 'CZ',ax=ax, label = 'Stim 3')

<matplotlib.axes._subplots.AxesSubplot at 0x7f221cd8d490>

And the same for PZ (the electrode at the back of the scalp).

fig, ax = plt.subplots()
ax.set(xlabel="Time (in milliseconds)", ylabel="uV (microvolts)")
means_stim_1.plot(x = 'Time', y = 'PZ', kind = 'line', ylim = (-15,15), xlim = (-200,1000), ax=ax, label = 'Stim 1', figsize = (7,5), title = 'PZ')
means_stim_2.plot(x = 'Time', y = 'PZ',ax=ax, label = 'Stim 2')
means_stim_3.plot(x = 'Time', y = 'PZ',ax=ax, label = 'Stim 3')

<matplotlib.axes._subplots.AxesSubplot at 0x7f221cc45810>

The results are remarkably similar to it's R equivalent!

mjtat/Plotting-Event-Related-Potentials-in-Python

ERPs in Python