Python - Exploratory Data Analysis CheatSheet

Reading a CSV file

Use header=None when the columns are not labeled in your csv file

df = pd.read_csv("pathToFile.csv", header=None)

Reading an Excel(.xlsx) file

Use header=None when the columns are not labeled in your xlsx file

df = pd.read_excel("pathToFile.xlsx", header=None)

Show first 5 rows of a DataFrame

df.head()

Show last 5 rows of a DataFrame

df.tail()

Show shape of the dataframe

df.shape

Show all column names in the DataFrame

df.columns

Count occurances of all unique values in a column

df['column_name'].value_counts()

Show mean, std dev, max etc for each column

df.describe()

Show sum of all null/NaN rows in each column

df.isnull().sum()

Show datatypes for all columns

df.info()

Heat Map of where and which columns has null/NaN values

NOTE: import seaborn as sns

sns.heatmap(df.isnull())

Drop multiple columns at once

axis=1 is for columns

df.drop(['column_1','column_2'],axis=1,inplace=True)

Fill NaN values with mean value of a column

df['column_name']=df['column_name'].fillna(df['column_name'].mean())

Get numerical values for categorical data

df['column_name'] = pd.factorize(df['column_name'])[0]

Get all unique calues in categorical data

unique = pd.factorize(df['column_name'])[1]

Show all indexs in the dataframe

df.index

Convert dataframe to numpy array (column names are ignored and only float/integers allowed)

df.to_numpy()

Sort values by a column

df.sort_values(by='colName')

Copy a whole dataframe

df.copy()

Drop the rows which have Nan values

df.dropna()

Replace Nan values with a specified value

df.fillna(value=10)

Return a dataframe of boolean values to check Nan values

pd.isna(df)

Calculate the mean of each column

df.mean()

Calculate the mean of each row

df.mean(1)

Concatenate dataframes

pd.concat([df[:2],df[3:6]])

Merge two dataframes with a custom index

pd.merge(df1,df2,on='indexColName')

Groupby column and sum

df.groupby('colName').sum()

Subtract all columns by a specific column

df.subtract(df['col'],axis=0)

Save a dataframe to csv file

df.to_csv('filename.csv')

Save a dataframe to excel sheet

df.to_excel('filename.xlsx',sheet_name='Sheet1')

sabm0hmayahai/python-eda-cheatsheet

Python - Exploratory Data Analysis CheatSheet

Reading a CSV file

Reading an Excel(.xlsx) file

Show first 5 rows of a DataFrame

Show last 5 rows of a DataFrame

Show shape of the dataframe

Show all column names in the DataFrame

Count occurances of all unique values in a column

Show mean, std dev, max etc for each column

Show sum of all null/NaN rows in each column

Show datatypes for all columns

Heat Map of where and which columns has null/NaN values

Drop multiple columns at once

Fill NaN values with mean value of a column

Get numerical values for categorical data

Get all unique calues in categorical data

Show all indexs in the dataframe

Convert dataframe to numpy array (column names are ignored and only float/integers allowed)

Sort values by a column

Copy a whole dataframe

Drop the rows which have Nan values

Replace Nan values with a specified value

Return a dataframe of boolean values to check Nan values

Calculate the mean of each column

Calculate the mean of each row

Concatenate dataframes

Merge two dataframes with a custom index

Groupby column and sum

Subtract all columns by a specific column

Save a dataframe to csv file

Save a dataframe to excel sheet