/IntroToDataWrangling

A brief course intended to introduce non-programmers to python and data wrangling. Also, demonstration of network optimization, pdf creation in Python, and a simple Monte Carlo simulation.

Primary LanguageJupyter Notebook

Intro to Data Wrangling

This Intro to Data Wrangling course in Python was designed originally for BYU Global Supply Chain Management students who are interested in data wrangling to get the very basics of Python with Data Wrangling, taught in a context that may be familiar to them. It was hoped that this file was structured in a way that someone with no programming experience at all could follow along, make sense of it, and pick up some useful skills. Content and definitions are not anything official and if the author was not able to think of cohesive definitions, help was gotten from OpenAI's ChatGPT or GitHub's Copilot. The dataset used is a modified version of this open-source dataset and its use in this file may not line up with its original intended use. Numbers used in calculations are almost certainly not correct and will not align with real-world numbers (like profit/flight or the cost to rent out space) but were used to teach the principles and get close to relevant GSCM contexts and can easily be replaced with more relevant numbers. A special thanks to all of my instructors and professors in the IS program, including Professor Keith, Professor Hilton, Professor Cutler, Professor Wells, Professor Reese, Professor Anderson, and Professor Schuetzler, as well as Professor Hathaway in the GSCM program who helped give me the idea to do this while working as his teaching assisant. Any feedback is greatly appreciated.

Using the Course

The main course materials are in the StarLift.ipynb file, which references files the attached folders for instructive content and imported data. It is intended that the user would go through the first file, StarLift.ipynb (estimated completion in 4-6 hours), and then through StartLift Continued.ipynb if they were interested in taking it further. The second file mentioned, StarLift Continued.ipynb, is less instructive and is expected to be beyond a starter's capabilities to replicate immediately, though hopefully with everything learned in StarLift.ipynb the learner can, with some time and Googling, understand every line and know why it is there. StarLift Continued.ipynb is also there to demonstrate the use of more complex simulations to solve problems, even though the simulation used (Monte Carlo) was not entirely applicable to the context.

To use the course, download all of the files and open the folder in your IDE of choice. The course was created using VS Code and should be able to be run in the IDE quite easily, as long as Python is installed. Some adjustments wererequired to get the same code working in Google Colab and the following links are to those files if you find that easier than using an IDE installed on your computer: StarLift.ipynb and StarLift Continued.ipynb. Jupyter systems were not tested for running these files, so if you decide to run it there please leave your feedback on how it runs and what could be changed.

Getting Started

If you are brand new to programming, consider checking out this video to get Python up and running on your device. If not, either pull this repository using git commands or download the code as a zip file. Again, reference the links above if you would like to instead use Google Colab.