Ironhack logo

Lab | Feature Extraction and Introduction to Supervised Learning

Introduction

As data analysts or data scientists, we find that we don't always get the data we need, but the data that we deserve. Many times it is up to us to extract meaningful information from our data. It could be done by transforming the data using derived columns, grouping the data and using aggregated information, or cleaning and reformatting the data. We will explore these techniques in this lab.

Getting Started

Open the main.ipynb file in the your-code directory. Follow the instructions and add your code and explanations as necessary. By the end of this lab, you will have learned how to prepare a dataset for most scikit-learn algorithms.

Deliverables

  • Pandas-concat-merge-join.ipynb with your responses.
  • main.ipynb with your responses.

Submission

Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.

Resources

Joining and Merging in Pandas

pandas.concat

pandas.DataFrame.merge

pandas.DataFrame.join

SQL Joins

pandas.Series.unique

pandas.DataFrame.dropna

pandas.to_datetime

numpy.where

pandas.Series.value_counts

pandas.core.groupby.DataFrameGroupBy.agg