Pandas is the Swiss-Multipurpose Knife for Data Analysis in Python. With Pandas dealing with data-analysis is easy and simple but there are some things you need to get your head around first as Data-Frames and Data-Series.
The tutorial provides a compact introduction to Pandas for beginners for I/O, data visualisation, statistical data analysis and aggregation within Jupiter notebooks.
Copy this repository to your computer
# get this repository
git clone https://github.com/alanderex/pydata-pandas-workshop.git
cd pydata-pandas-workshop
Having Anaconda installed simply create your ENV with
# install working environment with conda
conda env create -n pydata-pandas-workshop -f environment.yml
# environment should be activated now
# if not type: source activate pydata-pandas-workshop
# start juypter lab
jupyter lab
# paste the url displayed in your browser, if it doesn't open anyway:
# http://localhost:8888/lab
Alternatively you can also create a python virtual enviroment and
pip install -r requirements.txt
If you don't want to use anaconda, you can use python3 and
pip install pandas jupyter barnum numpy matplotlib xlsxwriter seaborn bokeh jupyter parquet dask
(at your own risk)
-
CSV
-
Excel
-
JSON
-
Clipboard
-
data
- .info
- .describe
- Ode to NumPy
- Data-Series
- Data-Frames
- Data-Series:
- Slicing
- Access by label
- Index
- Data-Frames:
- Slicing
- Access by label
- Peek into joining data
- Returns a copy / inplace
- Boolean indexing
- add/substract
- multiply
- plot your data directly into your notebook
- Merging
- Multi-Index
- DateTime Index
- Resampling
- Pivoting
- Faster file I/O with Parquet
- Scaling and Distributing with Dask