/python-for-gis-progression-path

Progression path for a GIS analyst who wants to become proficient in using Python for GIS: from apprentice to guru

MIT LicenseMIT

python-for-gis-progression-path

Progression path for a GIS analyst who wants to become proficient in using Python for GIS: from apprentice to guru

This is a work in progress

This is an attempt to provide a structured collection of resources that could help a GIS professional to learn how to use Python when working with spatial data management, mapping, and analysis. The resources are organized by progress category so basically everyone should be able to learn something new along the way.

The resources will include books, web pages and blog posts, online courses, videos, Q/A from GIS.SE, links to code snippets, and some bedtime readings.

The resources will be applicable both for Esri software users as well as open-source GIS professionals.


Beginner

You should be able to write short simple scripts in pure Python with no connection to GIS. To learn the basics of Python, you can find a ton of resources online such as CodeAcademy, Learn Python the Hard Way, Dive into Python, A Whirlwind Tour of Python, and many other books from Python.org and this Free programming books GitHub repo.

Resources

If you don't want to learn Python this way and would rather like to catch up learning how Python can be used for GIS:

Books

Going through these books may be sufficient to learn everything you may ever need if you are an Esri or an open-source GIS user, respectively.

Courses

Videos

Look for videos at Esri Video web page and search for Python and sort by most recent. An example of URL.

Tutorials / web pages

Collections of resources

Skills

GIS specific

At this point, you should be able to:

  • write some simple scripts either using arcpy site-package or ogr/gdal/pyqgis libraries
  • report information about your GIS assets (data format, geometry type, data schema, spatial reference)
  • write code for calling ArcGIS geoprocessing tools (inspectig the arcpy.Result object returned) / ogr geometry methods / PyQGIS tools from Python code
  • perform an operation on multiple datasets in batch mode using arcpy/ogr listing functions
  • read and update attributes & geometry of features using arcpy.da cursors or ogr data source methods
  • create and operate arcpy.Geometry() objects (accessing their properties and methods) or ogr.Feature()
  • create an ArcGIS toolbox with a simple script tool executing a Python source file
  • report information about map layers (eg. data sources, broken paths, definition queries) within an ArcMap map document (.mxd) using arcpy.mapping module or pyqgis module

Python

At this point, you should be familiar with:

  • variables of different data types (numeric, string, Boolean, date etc.)
  • data structures of different types (list, tuple, dictionary, set)
  • for and while loops, if-elif-else blocks
  • import of external Python modules and packages (eg. import os)
  • functions and how they work (eg. input arguments and return statement)
  • reading/writing text files using the built-in open function
  • reading/writing .csv files using the csv module and unicodecsv module

Exercises

This section contains examples of tasks that you might need to write at some point of time. Implementing these tasks in Python code would be a good sign that you have mastered the basics of Python for GIS.

  • get a list of field names of Date type in a file geodatabase feature class
  • copy multiple shapefiles into a file geodatabase at once
  • re-project all rasters in a folder copying the results into a new folder
  • update data sources for layers in a map document and save a new map document
  • write to a .txt or a .csv file information about your GIS assets

Intermediate

Resources

Now, for getting started with Python development, Visual Studio Code with Python extension(s) is arguably the best choice. It's completely free, you can install it on any of your physical or virtual machines and it has great support for Python development. Choosing between commercial IDEs, Wing IDE or PyCharm would be a great choice.

Tutorials / web pages

Skills

GIS specific

At this point, you should be able to:

  • automate map production using arcpy.mapping with data-driven pages or pyqgis
  • manage .pdf files (eg. re-ordering, merging, splitting) using arcpy or pure Python packages such as pypdf2
  • export ArcMap map document layout to various file formats such as .png and .pdf
  • update text elements content in layout of an ArcMap map document
  • executing SQL queries from Python using arcpy.ArcSDESQLExecute() or GDALDataset::ExecuteSQL()
  • use FieldInfo, FieldMap, and FieldMappings classes from arcpy or ogr.FieldDefn() to manage data schema changes
  • customize custom ArcGIS script tool behavior using ToolValidator class or build simple QGIS plugins
  • start using Python toolboxes and Python add-ins in ArcGIS when it makes sense
  • debug arcpy-driven code with the help of geoprocessing messages
  • writing smaller unit tests for GIS workflows
  • handling JSON data in Python and arcpy and GeoJSON for ogr
  • read Excel files using xlrd Python package
  • generate simple Excel files from datasets with Python and xlsxwriter package or xlwd
  • using arcpy.da.Walk() and os.walk() to traverse folders with GIS datasets recursively

Python

At this point, you should be familiar with:

  • installing Python packages using pip
  • PYTHONPATH environment variable and concept of paths and running Python programs from cmd
  • Python 3 to be able to write code that will be ported later to ArcGIS Pro / QGIS 3.x
  • Python PEP-8 style guide
  • collections module data structures such as defaultdict, namedtuple, Counter
  • list, dictionary comprehensions, and set comprehensions + set theory operations
  • enumerating sequences using the built-in enumerate function
  • writing own functions and handling the arguments with *args and **kwargs
  • lambda/anonymous and convenience functions
  • accessing DBMS databases using Python
  • working with disk-based databases such as SQLite from Python
  • using non-Latin characters in the source file, handling Unicode, encoding shebang
  • Python exceptions and try/except block
  • Python traceback module
  • tuple unpacking with function calls
  • sending emails with Python
  • accessing ftp sites with Python using ftplib module
  • running Python files with the cmd and a task scheduler
  • zipping folders and files with Python and reading/unpacking archive files (using zipfile module for .zip files and tarfile for .tar and .tar.gz files)
  • sending SMS using Python and Twilio
  • logging your Python programs (using logging module) - handy to use instead of print statements

Advanced

Resources

GIS specific

Python

  • Learn how Python is used in the enterprise watching the Enterprise Software with Python O'reilly video course

  • Learn IPython and the concept of reproducible research:

  • Learn about using Python for web development:

  • Watch Python – Beyond the Basics on Pluralsight

  • Learn about nlpk Python package to work with human language data (eg. parsing address data)

  • Learn about regex Python package to work with regular expressions in Python (eg. finding addresses in a specific format)

  • Learn about difflib and Levenshtein C extension to do fuzzy string matching (eg. finding the closest address string in the registry for an input address)

  • Learn Selenium Python package to be able to automate web app testing. Read the docs for Python bindings here

  • Learn about numerical computing and data science:

  • Learn about connecting to various DBMS from Python:

    • For Microsoft SQL Server - pymssql
    • For Oracle - cx_Oracle
    • For PostgreSQL - psycopg2 or sqlalchemy
  • Learn about using machine learning with Python:

    • Start using scikit-learn for various GIS-related operations such as data classification and regression as well as scikit-image for image processing (e.g., satellite imagery recognition)
  • Learn about using computer vision (CV) with Python to do image processing:

  • Learn about creating and parsing HTML:

    • Parse and construct HTML pages with Python using BeautifulSoup. Having this skill would be handy when a web page should be searched for some information and loaded into a GIS dataset or when you are building HTML reports
    • Learn how the registrant package reports information about the Esri geodatabase contents
    • Learn about web scraping using Scrapy
  • Learn about creating and parsing XML:

    • Parse existing .xml files using built-in xml.etree.ElementTree class and 3rd party package lxml
  • Learn about source code testing, linting, and refactoring:

    • Learn unittest built-in module and more advanced pytest framework
    • Learn coverage.py module to create code coverage reports
    • Learn Hypothesis for writing more powerful unit tests
    • Learn Python linters such as pylint, flake8, and pyflakes8 to keep the code tidy
    • Learn about Python style guides such as Google style guide. This will be particular useful when you start working in a team
    • Learn about most comprehensive Python linter wemake-python-styleguide . It is just a flake8 plugin; however, it combines violations from a lot of other flake8 plugins
    • Learn Python formatters such yapf and autopep8 to automatically reformat the source code to conform to a style. It is best to run autopep8 with aggressive option enabled to reformat the code and then run yapf on the result code
    • Learn SonarPython static code analyzer to find code smells and refactoring options. Many of the rules from SonarPython are implemented in wemake-python-styleguide
    • Learn about Python interface files (PEP-484) and how to use them to help your Python IDE to do static code analysis and provide better intellisense
  • Start looking for doing certain things outside of GIS applications using pure Python, for instance, using pandas

  • Learn best practices for organizing configuration and settings for a larger workflow where you need to keep the config values separately from the business logic (eg. using json, ConfigParser or using OOP constructors)

  • Learn about extending Python with C or C++:

Skills

GIS specific

At this point, you should be able to:

  • execute ArcObjects code from Python using comtypes library
  • export the data from tables and feature classes into Excel with custom formatting using xlsxwriter
  • generate .pdf files from scratch that would contain map images, custom charts, and tables using reportlab
  • split, merge, crop, and transform .pdf documents using pypdf2
  • generate .pdf report files using ArcGIS report templates (.rlf) and arcpy
  • generate graphs using arcpy.Graph, arcpy.GraphTemplate with graph template files (.tee), and Make Graph GP tool
  • perform graph theory operations on linear datasets using networkx (eg. point-to-point routing)
  • plot geodata with Matplotlib (both vector and raster)
  • use numpy and pandas for manipulating spatial dataset attribute table
  • use requests and/or arcrest package to access ArcGIS Server site, ArcGIS Online / Portal organizations through the ArcGIS REST API
  • call FME workbenches from Python
  • access readers and writers in FME with fmeobjects
  • read, modify, and write a georeferenced image
  • generate useful information about a point dataset (most isolated points, a pair of two furthest points, etc)

Python

At this point, you should be familiar with:

  • building desktop GUI applications using PyQt, PySide, or Kivy (eg. visualize a shapefile's features in an application window)
  • contributing to open-source projects such as arcrest or geopandas reporting bugs or pulling in new functionality
  • creating new conda environments and installing various packages into specific environments
  • refactoring wrapping the code into functions, modules, and packages
  • OOP basics and creating own classes
  • compile a simple Python extension module (.pyd) and write a .pyi interface file to provide the intellisense for your Python IDE

Exercises

This section contains the examples of tasks that you might need to write at some point of time. Implementing these tasks in Python code would be a good sign that you have mastered the advanced concepts of Python for GIS.

  • hide/show map grid of data frame in a map layout before exporting the map in a map document using arcpy package and ArcObjects
  • update label's text of a scale bar in a map layout using pure ArcObjects
  • generate a service area (drive-time) polygon for an arbitrary point on a street network stored as a shapefile using networkx
  • find out the fastest spatial join - ArcGIS Spatial Join GP tool, rtree in PostGIS, SQL Server STContains, or shapely Python package
  • create a new .csv file from an existing one by filtering certain rows using pandas
  • classify point dataset features into clusters using scikit-learn to mimic some of the ArcGIS Spatial Statistics tools
  • write a program that will calculate the area of a lake automatically recognized from a satellite imagery
  • build with the help of PyQt a GUI application for executing SQL queries against file geodatabases