/Kaggle_study

Practice

Primary LanguageJupyter Notebook

Numpy and Pandas study

If you fail to load this "ipynb" file in github, please try it!

Pandas Contents

pandas_exercise1.ipynb

Trick 100: Loading sample of a big data file
Trick 99: How to avoid Unnamed: 0 columns
Trick 98: Convert a wide DF into a long one
Trick 97: Convert year and day of year into a single datetime column
Trick 96: Interactive plots out of the box in pandas
Trick 95: Count the missing values
Trick 94: Save memory by fixing your datetypes
Trick 93: Combine the small categories into a single category named "Others" (using frequencies)
Trick 92: Clean Object column with mixed data using regex
Trick 91: Creating a time series dataset for testing
Trick 90: Moving columns to a specific location

pandas_exercise2.ipynb

Trick 89: Split names into first and last name
Trick 88: Rearange columns in a DF
Trick 87: Aggregate you datetime by by and filter weekends
Trick 86: Named aggregations - avoids multiindex
Trick 86: Named aggregations on multiple columns- avoids multiindex
Trick 85: Convert one type of values to others
Trick 84: Show fewer rows in a df
Trick 83: Correct the data types while importing the df
Trick 82: Select data by label and position (chained iloc and loc)
Trick 81: Use apply(type) to see if you have mixed data types
Trick 80: Select multiple slices of columns from a df
Trick 79: Count of rows that match a condition
Trick 78: Keep track of where your data is coming when you are using multiple sources
Trick 77: Combine the small categories into a single category named "Others" (using where)
Trick 76: Filter in pandas only the largest categories.
Trick 75: Count the number of words in a pandas series
Trick 74: Webscraping using read_html() and match parameter
Trick 73: Remove a column and store it as a separate series
Trick 72: Convert continuos variable to categorical
Trick 71: Read data from a PDF (tabula py)
Trick 70: Print current version of pandas and it's dependencies

pandas_exercise3.ipynb

Trick 69: Check if 2 series are "similar"
Trick 68: Webscraping using read_html()
Trick 67: Create new columns or overwrite using assing
Trick 66: Create a bunch of new columns using a for loop and f-strings df[f'{col}_new']
Trick 65: Select columns using f-strings (new in pandas 3.6+)
Trick 64: Fixing "SettingWithCopyWarning" when creating a new columns
Trick 63: Calculate running count with groups using cumcount() + 1
Trick 62: Fixing "SettingWithCopyWarning" when changing columns using loc
Trick 61: Reading JSON from the web into a df
Trick 60: Creating running totals with cumsum function
Trick 59: Combine the output of an aggregation with the original df using transform
Trick 58: Use header and skiprows to get rid of bad data or empty rows while importing
Trick 57: Accesing the groups of a groupby object (get_group())
Trick 56: Apply a mappings to the whole df (applymap)

pandas_exercise4.ipynb

Trick 55: Filtering a df with multiple criteria using reduce
Trick 54: Calculate the difference between each row and the previous (diff())
Trick 53: Shuffle rows of a df (df.sample())
Trick 52: Making plots with pandas
Trick 51: Concatenate 2 column strings
Trick 50: Named aggregation with multiple columns passing tupples (new in pandas 0.25)
Trick 49: Sampling with pandas (with replacement and weights)
Trick 48: Useful parameters when using pd.read_csv()
Trick 47: Create one row for each item in a list (explode)
Trick 46: Store NaN in an integer type with Int64
Trick 45: Create rows for values separated by commas in a cell (assing and explode)
Trick 44: Use a local variable within a query in pandas (using @)
Trick 43: Create one row for each item in a list (explode) !!!duplicated Trick 47!!!
Trick 42: New aggregation function --> last()
Trick 41: Ordered categories (from pandas.api.types import CategoricalDtypee)

pandas_exercise5.ipynb

Trick 40: Style you df fast with hide_index() and set_caption()
Trick 39: One hot encoding (get_dummies())
Trick 38: Pandas datetime (lot's of examples)
Trick 37: Pandas slicing loc and iloc (6 examples)
Trick 36: Convert from UTC to another timezone
Trick 35: Query a column that has spaces in the name (using backticks)
Trick 34: Explore a dataset with profiling
Trick 33: Pandas display options
Trick 32: Filter a df with query and avoid intermediate variables
Trick 31: See all the columns of a big df
Trick 30: Pandas merge --> see where the columns are coming from (indicator = True)
Trick 29: Access numpy within pandas (without importing numpy as np)
Trick 28: Aggregating by multiple columns (using agg)
Trick 27: Aggregation over timeseries (resample)
Trick 26: Formatting different columns of a df (using dictionaries)
Trick 25: 3 ways of renaming columns names
Trick 24: Copy data from Excel into pandas quick (read_clipboard())
Trick 23: Fill missing values in time series data (interpolate())
Trick 22: Create DataFrames for testing
Trick 21: Split a string column into multiple columns
Trick 20: Create a datetime columns from multiple columns

pandas_exercise6.ipynb

Trick 19: Show memory usage of a df and every column
Trick 18: Read and write to a compressed file (csv.zip)
Trick 17: Select multiple rows/columns with loc
Trick 16: Convert continuos values to categorical (cut())
Trick 15: Reshape a MultiIndex df (unstack())
Trick 14: Creating toy df (3 methods)
Trick 13: Avoid the series of lists TRAP
Trick 12: Merging datasets and check uniqueness
Trick 11: Rename all columns with the same pattern
Trick 10: Check the equality of 2 series
Trick 9: Reduce memory usage of a df while importing
Trick 8: Using glob to generate a df from multiple files !!!duplicated Trick 78!!!
Trick 7: Dealing with missing values (NaN)
Trick 6: Split a df into 2 random subsets
Trick 5: Convert numbers stored as strings (coerce)
Trick 4: Select columns by dtype
Trick 3: Filter a df by multiple conditions (isin and inverse using ~)
Trick 2: Reverse order of a df
Trick 1: Add a prefix or suffix to all columns