Programmatic Geoprocessing Workflows using Pandas Dataframes in ArcGIS Pro Notebooks Zachary Uhlmann

Zachary Uhlmann
ABSTRACT: Using examples from my job at a mid-sized heavy civil engineering firm and drawing on previous experience in the natural resource field, we will use Python within a Jupyter Notebook in Pro to create, analyze, document, and ultimately publish data to ArcGIS Online. We will set parameters and document metadata from CSVs using Pandas, one of the essential python data analytics libraries. This workflow will raise the ceiling of geoprocessing efficiency and customization in our GIS workflows while keeping database information accessible to non-GIS people and those with no programming experience. Hopefully these examples can motivate people to overcome their apprehension, and finally begin the switch from Desktop to Pro.

Tutorial Prep

1. Download reposistory:

  • download as zip file

    download_git
  • unzip folder, and change unzipped folder name to: notebooks_tutorial
  • NOTE - we will create Pro map here, so ensure directory path isn't too long or special character crazy
    • My full path is this: C:\Users\uhlmann\Documents\urisa_conference_2022\notebooks_tutorial
    • your full path should be: path\to\your\notebooks_tutorial

      CONTENTS (note that we will add four_corners_morels directory (Pro project) in next step and .git will not be present in your download dir
      git_dir

2. Create new ArcGIS Pro Project

  • Create new ArcGIS Pro project within the directory from step 1.
    new_project
  • IMPORTANT Title the new project exactly(!): four_corners_morels
  • Notice the directory in above screenshot

3. Add Folder Connection and open both Notebooks

  • In ArcGIS Pro with project open, Add Folder Connection to notebooks_tutorial directory
  • Open urisa_p1 and basics Notebooks from within notebooks_tutorial directory

Begin Tutorial

Tips to self-guided tutorial via Notebooks

  • All cells except those that are entirely commented out (i.e. no black non-commented code) will be run in order. Note that you can run the cells that are pure comments to no consequence - nothing will happen... :)
  • Comments
    • code in Python is Commented-out using # sign
    • lines will be rendered in green
    • READ - these are line by line instructions and notes to compliment below cell by cell instructions
      comments

Steps Summarized (Note that Notebook will have more verbose instructions)

Open urisa_p1 Notebook and begin...

  • Cell 1: Load Modules. Imports libraries
  • Cell 2: Canvas. Follow instructions to populate map with shapfiles and zoom in
  • Cell 3: Paths. Replace string within single quotes with path/to/your/notebooks_tutorial directory
  • Cell 4: Load Pandas DataFrame: Read csv into DataFrame and inspect
  • Cell 5: Incorrect File Path. Notice this is MY file path, not yours
  • Cell 6: Fix File Path the Hard Way. Follow comments, run cell or change path manually and rerun Cell 4
  • Cell 7: Dictionary. Not necessary, but makes later code more readable. I like dictionaries
  • Cell 8: Arguments for Copy FC. Create and pull from DataFrame (df) all arguments to copy Four Corners Fire Perim
  • Cell 9: Create Datasets. Go to basics Notebook (open it) and run all three cells in order
  • Cell 10: Create new FC. Copy selected fire perimeter into new feature class in agency dataset within project gdb. DON'T FORGET to select four corners fire from attribute table prior to running
  • Cell 11: Clean up map. A reminder to remove redundant FC
  • Cell 12: Set values to DataFrame. Record data location of four_corners_fire_perim
  • Cell 13: Arguments for Copy Roads FC. Same thing as fire boundaries, but passing full path of roads dataset location from DataFrame directly to function (Cell 14) as opposed to from Table of Contents (TOC)
  • Cell 14: Copy roads FC to Pro gdb. Run function, copy into database - should appear in map depending on user's Pro settings
  • Cell 15: Set values to DataFrame. New data location and original.
  • Cell 16: Clip contour polygons to fire boundary. NOTE that we combined the assimilating of arguments AND running of function (Clip_analysis) into a single cell in this instance as opposed to assigning variables (arguments) in one cell and calling function in next cell. Output clipped contour polygons for use in final map
  • Cell 17: Set values to DataFrame. LOOK(!) Added a couple more metadata attributes that can be saved to csv and later used to populate Item Description (a different tutorial!)
  • Cell 18: Clean up map.
  • Cell 19: Inpsect DataFrame prior to saving. Also notice the Abstract we added - just for fun!
  • Cell 20: Save updated csv. Retained original for repeated runs of tutorial, as updated csv now has updated file paths. You can find the updated csv in notebooks_tutorial directory.

Conclusions

  • Notebooks are a nice way to organize programmic workflows
  • Pandas offer nice alternatives to databases if project is being utilized by non-technical users. Metadata-ish components are contained in the csv
  • Same with all coding, Notebooks are simply text documents (.ipynb) that can be reused.
  • Code can be incorporated into more complicated object-oriented modules as functions and shared amongst colleagues.
  • For projects with logs of intermediary data or just lots of data copying, creating, etc., information about data provenance or geoprocessing steps can be added to dataframe concurrently for future reference.
  • Covered in the actual presentation at URISA - Pandas is both a core geospatial data analytics library but also originated and developed across disciplines and industries. It's an essentialy Python library to learn.
  • Not Covered Today:
    • I use the data inventory csv (four_corners_morels.csv) to populate Item Descriptions via the shapefile's xml. I imagine it can be parsed into a xslt as well.
    • Ditto for cleaning databases periodically and once projects are complete. I delete, move, rename, copy feature classes via the inventory.csv to cleanup the geodatabase.