In this tutorials we want to play with the Jupyter Notebook (http://jupyter.org/) and do some data wrangling with Python, Pandas (http://pandas.pydata.org/) and co., some data analysis and visualisation.
Start in Binder to work directly online (no install required):
Miniconda is an encapsulated versatile virtual python environment installer, that works under the hood of the big Anaconda python distribution. Miniconda is basically a mini version of Anaconda that includes only the conda package manager and its dependencies:
https://conda.io/miniconda.html
Python 3.6, 64-bit (exe installer) https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe
BEWARE:
- To install miniconda SYSTEM-WIDE for ALL users, this does require administrator permissions; every users can then create their own environments with the conda tool.
- Please do NOT make Conda the default python for the system if you don't want it to interfere with other Python installations you might have, eg. Pythons of ArcGIS and QGis etc
Additional install information: https://conda.io/docs/user-guide/install/index.html
How to use the conda command? Open the Anaconda/conda command prompt from the Start menu:
conda create
basically represents a typical Python virtualenv command.
https://conda.io/docs/_downloads/conda-cheatsheet.pdf
Here it becomes obvious how practical virtual environments can be. They help you to keep various Python versions around without messing up your system.
(C:\dev\conda3) conda create --name py36 python=3.6
and then show all environments:
#> conda env list
Now we want to activate that environment and start working with it:
activate py36
(py36) conda install jupyter
latest: http://pandas.pydata.org/pandas-docs/version/0.20.3/index.html (https://pandas.pydata.org/pandas-docs/stable/index.html)
0.19.2 (used by Geopandas): http://pandas.pydata.org/pandas-docs/version/0.19.2/
(py36) conda install pandas=0.19.2
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
Download the full Meetup-Repositiry from https://github.com/allixender/meetup-notes/archive/master.zip and extract it to a LOCAL folder. Go back into the console/commandline prompt and change into that folder. There you should then start the Jupyter notebook.
# change into the directory where you extracted the materials, into the python-jupyter folder
cd C:\Users\Alexander\Documents\meetup-notes\python-jupyter
# make sure you have the corrent conda / python environment activated (indicated by the brackets at the begiining of the command prompt)
# start the jupyter notebook server
(py36) jupyter notebook
You should get a website opening that lists you README files and the 3 folders.
This repository contains a curated collection of Jupyter Notebooks of introductory materials about programming in Python in general, the Pandas data manipulation and visualisation toolkit, and the Geopandas geospatial library.
This repository aims at different degrees of experience with the Python programming language and Python data manipultation and visualisation libraries.
Each folder contains a README.md
(for the the online repository) and README.notebook.md
(readme as readable Notebook)
file that list the contents and some details for each of the 3 big modules here:
- 01 - Introduction to Python and the Jupyter Notebook
- 02 - Introduction to the Pandas library
- 03 - Introduction to the Geopandas library
Open the README.notebook.md
in the base folder and now take directions to the linked working notebooks and lessons from there.
The folder 01-intro (Link to the 01 README.notebook) contains notebooks that are intended to aid both students and teachers in learning and teaching Python programming, respectively.
In more details, the goals of this module are:
- Introduce students as quickly as possible to the basics of Python programming;
- Introduce best practice as early as possible, while remaining accessible to students with no background in programming at all;
- Provide teachers an easy-to-use material about programming in Python to be used in their lectures
pandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly.
The folder 02-pandas (Link to the 02 README.notebook) contains a module that gives you some concrete examples for getting started with pandas. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that entails.
The folder 03-geo (Link to the 03 README.notebook) contains Jupyter Notebooks that demonstrate the use of the Geopandas library. http://geopandas.org/index.html
GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.
We need to install the Geopandas python library and two libraries that we use to read and write from Excel-Spredsheets.
(py36) conda install -c conda-forge geopandas
(py36) conda install openpyxl xlrd
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This work uses some materials from the following repositories: