/Resources

Inventory of all the educational content that I share on spatial data analytics, geostatistics and machine learning. I hope these resources are helpful, Prof. Michael Pyrcz

MIT LicenseMIT

Prof. Michael J. Pyrcz, @GeostatsGuy, Resources

Howdy Folks, I'm Michael Pyrcz, a Professor at The University of Texas at Austin. I teach and conduct research on:

  • data analytics
  • geostatistics
  • machine learning

I'm appointed in the:

  • Hildebrand Department of Petroleum and Geosystem Engineering, Cockrell School of Engineering
  • Department of Planetary and Earth Sciences, Jackson School of Geosciences
  • Bureau of Economic Geology

I'm also a:

  • principal investigator in the College of Natural Sciences Energy Analytics Freshmen Research Initiative and Inventors' Program, The University of Texas at Austin
  • core faculty in the Machine Learning Laboratory in Computer Sciences, all at The University of Texas at Austin.
  • principal investigator (co-PI with Professor John T. Foster) of the DIRECT industrial consortium

I feel that the role of professor is a role of service, so I post all my lectures and supporting content online resulting in evergreen content that outlasts the semester and reaches beyond campus. I hope this content supports:

  • my students for ongoing learning content long after they finish my courses
  • working professionals facing the digital transformation and interested to learn new skills
  • potential students by breaking down barriers and making our university a welcoming place for all interested to learn

Here's an inventory of my online resources that I have made to help people learn about spatial data analytics, geostatistics and machine learning. I have produced these resources to support my students and I thought they would be useful to my students after completion of the class (an evergreen resource), to other students and working professionals interested in this topic.

I hear from students, working professionals and potential students everyday that benefit from these products!

Michael Pyrcz, Associate Professor, University of Texas at Austin

Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions

With over 17 years of experience in spatial, subsurface data analytics consulting, research and development, and leadership, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and (geo)scientists' impact in spatial, subsurface resource development.

For more about Michael, my research group (15 PhDs), my consortium (DiReCT), my publications, my background, my education startup etc. check out these links:

About Michael

Want to learn more about my story, my publications and other contributions to open source, check this out:

  1. My story of how I got started in engineering and ended up as a professor at The University of Texas at Austin My Story

  2. My research, approach to research and views on building an inclusive and diverse team My Research

  3. Nothing is possible without awesome graduate students My Students

  4. I've written a bit, here's the books My Books

  5. My peer-reviewed publications My Papers

  6. My other contributions My Other Contributions

  7. I wrote an open source Python package for spatial data analytics and geostatistics. Much of it is a translation of GSLIB (Deutsch and Journel, 1998) from the original Fortran to Python for 2D geostatistical methods. I did this to support my students in my Spatial Data Analytics and Geostatistics courses. Check it out and consider contributing and become a coauthor at GeostatsPy on PyPi Repository and GitHub.

  • NOTE, since GeostatsPy relies on the Numba package for code acceleration, and Numba is not updated to Python >= 3.9, please use Python < 3.9 with GeostatsPy.
  1. I do quite a bit on social media, here's why I do it, My Social Media Efforts.

  2. Check out my TEDx talk on 'A Professor's Secret Weapon' TED Talk

  3. Check out my Twitter feed for resources, ideas and possitivity most days, where I'm the GeostatsGuy Twitter.

  4. I post a lot of code, demonstration workflows and course material to support anyone that wants to learn My GitHub

  5. I partnered with Prof. John Foster (UT Austin) and Bazean, a technology-enabled energy investment firm, to start the energy-focussed data science education company, daytum. We are currently offering short courses in Energy Data Science.

Michael Pyrcz, Professor, University of Texas at Austin

Online Resources on Spatial Data Analytics, Geostatistics and Machine Learning

Recorded Lectures

I record all my university lectures and post them on YouTube. You are welcome to join my classes!

  1. Introduction - Howdy, I'm Michael

  2. YouTube Channel GeostatsGuy Lectures

  3. Introduction to Data Analytics, Geostatistics and Machine Learning Undergraduate Lectures (Lec00-Lec21)

  4. Subsurface Modeling Graduate Course (Lec00 - Lec22)

  5. Subsurface Machine Learning Graduate Course (Lec00 - Lec18)

  6. Data Science Basics in Python (Chapter I - III)

  7. Open Source Spatial Data Analytics in Python with GeostatsPy

  8. My TED Talk, A Professor’s Secret Weapon

  9. Introduction to Spatial Continuity

  10. Tutorial: Open Source Spatial Data Analytics in Python with GeostatsPy

  11. Geostatistical Workflows for Unconventional Reservoirs)

  12. Geostatistical Workflows for Unconventional Reservoirs at BEG

  13. What Does a Geoscientist Need to Know About Geostatistics? And Why It Would Be Helpful?

  14. Center for Petroleum and Geosystems Engineering Webinar - Big Data Analytics for Petroleum Engineering: Hype or Panacea?

  15. Michael's Unsolicited Advice and Ideas for a Successful and Happy Career in Our Industry

  16. My interview on AAPG's Digging Deeper podcast with the awesome host Vern Stefanic.

GeostatsPy Python Package Workflows

I wrote a Python Package called GeostatsPy for spatial data analytics and geostatistics. Here's a set of demonstration workflows in Python Jupyter Notebook for many of the fundamental workflow steps from data preparation, statistical inference to spatial prediction with uncertainty. They go along with my recorded lectures from my courses on my YouTube channels:

Here's the workflows:

  1. GeostatsPy: Reimplementation of GSLIB in Python
  2. Data Distributions with GeostatsPy
  3. Feature Ranking with GeostatsPy
  4. Volume Variance Relations with GeostatsPy
  5. Confidence Intervals and Hypothesis Testing with GeostatsPy
  6. Monte Carlo Simulation with GeostatsPy
  7. Bootstrap with GeostatsPy
  8. Data Distributions
  9. Data Distribution Transformations with GeostatsPy
  10. Declustering with GeostatsPy
  11. Ensemble Declustering with GeostatsPy
  12. Inverse Distance Interpolation with GeostatsPy
  13. Indicator Kriging with GeostatsPy
  14. Kriging with GeostatsPy
  15. Multivariate Analysis with GeostatsPy
  16. Overfitting Models with GeostatsPy
  17. Plotting Spatial Data with GeostatsPy
  18. Directional Spatial Continuity with GeostatsPy
  19. Spatial Updating with GeostatsPy
  20. Spatial Trend Modeling with GeostatsPy
  21. Multivariate Feature Ranking with GeostatsPy
  22. Variogram Calculation with GeostatsPy
  23. Variogram Modeling with GeostatsPy
  24. Spatial Bootstrap with GeostatsPy
  25. Spatial Simulation with GeostatsPy
  26. Spatial Indicator Simuluation with GeostatsPy
  27. Spatial Simulation Post-processing with GeostatsPy

Interactive Python Worklfows to Support Education

I think interactive workflows are excellent tools to support education. For data analytics and machine learning, turning a dial and watching a system or machine change is a great method to gain intuition and experience. I started to put together interactive workflows with ipywidgets and matplotlib. Check them out here:

  1. General Bootstrap
  2. Parametric Distributions
  3. Monte Carlo Simulation
  4. Bootstrap Colored Balls in a Cowboy Hat
  5. Norms
  6. Optimization
  7. Overfit
  8. DYI Central Limit Theorem
  9. Confidence Interval by Bootstrap and Analytical
  10. Sivia's Bayesian Coin
  11. Spurious Correlation
  12. Correlation Coefficient
  13. LASSO Regression
  14. Principal Components Analysis
  15. Ridge Regression
  16. Simple Kriging
  17. String Effect
  18. Stochastic Simulation
  19. Uncertainty with Spatial Aggregation
  20. Kriging String Effect
  21. Uncertainty Model Checking
  22. Variogram Calculation
  23. Variogram Modeling
  24. Combined Variogram Calculation and Modeling
  25. Spectral Clustering
  26. Artificial Neural Networks
  27. Checking Uncertainty Models
  28. Shapley Values

Resources on Statistics and Probability

  1. Probability Theory – my undergraduate lecture
  2. Statistics – undergraduate lecture
  3. Marginal, Joint & Conditional Probability – slides

Parametric Distributions

Parametric Distributions are fundamental to statistics and data analytics inferential and predictive workflows. Sometimes they are required by theory and often they result from nature. Many students struggle with them so I made simple demonstrations in Microsoft Excel that cover how to make them from scratch and how to work with them:

  1. How to make them in Excel
  2. Poisson distribution in Excel
  3. Gaussian transform in Excel and Python
  4. Log normal distribution in Excel
  5. Interactive parametric distributions in Python

Hypothesis Testing

Hypothesis Testing is all about recognizing the difference that makes a difference. These tests protect us from the belief in small numbers and are bias to see patterns in random phenomenon.

  1. Difference in means in Excel and in Python
  2. Difference in variances in Excel and in Python
  3. Difference in distributions in Excel
  4. Interactive hypothesis testing in Python

Demos of Bayesian Statistics

Bayesian Apporaches are powerful. They integrate prior belief with new observations, provide explicit uncertainty models and more intuitive credible intervals for uncertainty in model parameters. Here's some accessible demonstrations to get you started thinking like a Bayesian statician.

  1. The Coin Problem from Sivia (1996) in Excel
  2. Bayesian updating with Gaussian in Excel
  3. Probability given a positive test in Excel
  4. Sivia's Bayesian Coin in Interactive Python
  5. Bayesian Regression in Python
  6. Naive Bayes Regression and Classification in Python

Other

  1. Bootstrap in Excel, in Python and in R
  2. Spatial Bootstrap in Python
  3. Linear regression in Excel and in R
  4. Loss functions in Excel
  5. Multivariate Analysis

Heterogeneity

Our subsurface systems are heterogeneous and heterogeneity matters in many subsurface prediction problems. Here are some accessible demonstrations to help you get started quantifying heterogeneity.

  1. Making an example well in Excel
  2. Lorenz coefficient in Excel
  3. Hurst coefficient in R
  4. Ripley Cross K in R
  5. Ripley K-function in Python
  6. Lozenz coefficient in Python
  7. Lorenz coefficient functions in Python

Machine Learning

I have an new Subsurface Machine Learning Course that builds from fundamental probability to artificial neural networks. The recorded lectures are available here:

You are welcome to follow along! The demonstration workflows from the lectures are here:

  1. Feature Imputation in Python
  2. Feature Ranking in Python
  3. Feature Transformations in Python
  4. Feature Uncertainty in Python
  5. Dimensional Reduction in Python and in R
  6. Clustering in Python
  7. Principal Components Analysis in Python
  8. Multidimensional Scaling and Random Projection in Python
  9. Linear Regression in Python
  10. Ridge Regression in Python
  11. LASSO Regression in Python
  12. Isotonic Regression in Python
  13. Bayesian Regression in Python
  14. Polynomial Regression in Python
  15. Naive Bayes Regression and Classification in Python
  16. Time Series Analysis
  17. k Nearest Neighbour
  18. Decision tree in PythonPython Advanced and in R
  19. Gradient Boosting in Python and Advanced Gradient Boosting in Python
  20. Support Vector Machines in Python
  21. Neural Networks in Python
  22. Convolution Operators in Python
  23. Convolutional Neural Networks in Python
  24. Convolutional Neural Networks Classifier in Python
  25. Generative Adversarial Networks in Python
  26. Conditional Generative Adversarial Network in Python
  27. Course Conclusion
  28. scikit learn Overview

Geostatistics

  1. GeostatsPy: Reimplementation of GSLIB in Python
  2. Introduction to Data Analytics, Geostatistics and Machine Learning Undergraduate Lectures (Lec00-Lec21)
  3. What Does a Geoscientist Need to Know About Geostatistics? And Why It Would Be Helpful? and PPT
  4. Exercises, hands-on and demonstrations PPT Inventory
  5. Functions that reimplement or call GSLIB exes in Python
  6. Demo of the functions in Python
  7. Declustering in Python and with PyGSLIB Package
  8. Declustering and Debiasing in Excel
  9. Variogram calculation in Excel and in R
  10. Full variogram Calculation and Modeling in Excel and in PyGSLIB Package

Supplemental Slides

  1. Facies criteria in PPT
  2. Value of quantification in PPT
  3. Stationarity in PPT
  4. Uncertainty in PPT
  5. Suggested books in PPT
  6. Simple kriging in Excel and in R
  7. Uncertainty Away from Data in Excel
  8. Convolution methods in Python
  9. LU Simulation in Pyton
  10. Sequential Gaussian simulation in Excel and in R
  11. Truncated Gaussian simulation in Excel
  12. Spatial uncertainty in Excel
  13. Volume-variance relations in Excel
  14. Working with realizations in R
  15. Lecture on value in industry in PPT

I hope these resources are useful.

Want to Work Together?

I hope that this is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

  • Want to invite me to visit your company for training, mentoring, project review, workflow design and consulting, I'd be happy to drop by and work with you!

  • Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

  • I can be reached at mpyrcz@austin.utexas.edu.

I'm always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin

More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Book | YouTube | LinkedIn