Tip
Wide pivot table can be found here.
Tip
Concatenating two dataframes code can be found here.
Day 1
Here's what I achieved so far.
- Using either your new Google account or your personal account, open Google Colab in another tab
- Google Colab's interface and functions:
- Tools >> Settings >>
- Editor >> Show line numbers (check if you prefer)
- Miscellaneous >> Corgi mode, Kitty mode (turn on if you like)
- Test with some basic code:
- Click Connect at top right
- Write simple definition
- Test simple math problem
- Runtime settings
- Run cells
- Reset
- Code and text cells
- Save
- Tools >> Settings >>
- More about Colab’s Markdown here
- Open a new notebook on Google Colab
- Data Visualization:
- Machine Learning and AI
- Scientific Computing
- Automation and Web Scraping
- Database Access
- Natural Language Processing (NLP)
- Image Processing
- Data Analysis and Manipulation
- pandas
- Series
- Series Creation: Create a Pandas Series from a list of integers.
- Indexing and Slicing: Access specific elements and slices from the Series.
- Operations: Perform basic arithmetic operations on the Series.
- Filtering: Filter the Series to include only elements greater than a certain value.
- Missing Data: Introduce NaN values into the Series and handle them (e.g., fill with a value or drop).
- DataFrame
- DataFrame Creation: Create a DataFrame from a dictionary where the keys are column names and the values are lists of column data.
- Exploring Data: Display the first few rows, summary statistics, and data types of the DataFrame.
- Indexing and Selection: Select specific columns, rows, and subsets of the DataFrame.
- Adding Columns: Add a new column to the DataFrame based on existing columns.
- Handling Missing Data: Introduce NaN values and demonstrate methods to handle missing data (e.g., fillna, dropna).
- Data Manipulation
- Reading Data: Read a CSV file into a Pandas DataFrame.
- Filtering Data: Filter rows based on a condition.
- Sorting Data: Sort the DataFrame by a specific column.
- Grouping Data: Group the DataFrame by a column and compute aggregate statistics.
- Merging DataFrames: Merge two DataFrames on a common column.
- Series
- numpy
- Basic operations
- Array Creation: Create a 1D NumPy array of integers from 0 to 9.
- Reshape: Convert the 1D array into a 2D array with 2 rows and 5 columns.
- Slicing: Extract the first row and the second column of the 2D array.
- Arithmetic Operations: Create another 2D array of the same shape and perform element-wise addition, subtraction, multiplication, and division.
- Statistical Operations: Compute the mean, median, and standard deviation of the elements in the 2D array.
- Basic operations
- Understanding time series data
- Common time series patterns and terminology
- Loading and exploring time series data with Python
- Handling missing values, imputation and interpolation
- Removing duplicates
- Data type conversion and validation
- Download the dengue csv file
- Filter to only relevant columns
- Convert date to datetime format
- Identify postcodes with most complete/missing data
- Create a pivot table
- Visualize case counts data
- Detecting and handling outliers
- Smoothing time series data
- Handling seasonal and trend components
- Apply moving average smoothing and visualize
- Use a for loop to run the code for a few postcodes
Day 2
- Creating lag features
- Rolling statistics (moving average)
- Fourier transform and other feature extraction
- Normalization technique
- Standardization technique
- Effects of scaling on time series analysis
- Cretae a plot showing the number of cases for a selected location, with a lag
- Overlay external weather data
- Decomposing time series into trend, seasonality, and residuals
- Additive vs. multiplicative decomposition
- Machine learning cheat sheet
- Facebook Prophet
- Downsampling
- Upsampling
- Resampling with aggregation
- Frequency conversion
Day 3
- Split data into training and testing sets
- Time series cross-validation techniques
- Tokenization
- Stopword removal
- Stemming
- Lemmatization
- Text normalization
- Removing special characters and numbers
- Handling case sensitivity
- Removing stopwords and punctuation
- Geocoding using Google Cloud Platform
- Compare Google Cloud Platform's address details with original dataset
- Converting text to lowercase
- Expanding contractions
- Handling special characters and numbers
- Normalizing workspace
- Removing non-ASCII characters
- Remalizing tect using lemmatization
- Bag of Words
- Term Frequency-Inverse Document Frequency (TF-IDF)
- Word Embeddings (Word2Vec)
- Document Embeddings (Doc2Vec)
- N-grams
- Geocode with OpenStreetMap
- Geocode places with missing postcodes