Tools relating to the Xylem Global Innovation Challenge on Urban Flood Prediction
In this Challenge, we aim to use predictive modelling to help Portland, Oregon residents predict and take pre-emptive action against floods.
You can access the Floodopedia live app here
This is an open-source project and we strive to continually improve the functionality of Floodopedia. Feel free to make a pull request or raise issues!
Setup Instructions
Dependencies Setup
- Create a Python virtual environment (venv) by invoking the command on your Command Prompt Shell as follows
C:\>python -m venv C:\path\to\myenv
- On your shell, run
pip install -r requirements.txt
- [OPTIONAL] Download, install and run ngrok here if you want to make your locally run Flask servers accessible on the Internet.
Running Floodopedia on localhost
- On your shell, run
python app.py
- Go to localhost:5001 on your web browser. Alternatively, you can specify a different IP or port number on the
app.run(host='new-IP-address', port=<new-port-number>)
method inapp.py
and visit new-IP-address: instead. - Press
Ctrl
+C
to terminate server
Running Floodopedia on the Internet (Production server)
- On your shell, run
python app.py
- Open the ngrok shell and run
ngrok http 5001
orngrok http <your-port-number>
- You should see a temporary link on your shell and you can access Floodopedia via the displayed link.
- Press
Ctrl
+C
to terminate server
Prediction Model
About the prediction model
- The decision classifiers used are Gage Height, Turbidity and Discharge.
- As flooding is an extreme and rare event, available USGS Data had weak correlations (<0.20) with flooding. However, Gage Height, Turbidity and Discharge had the strongest correlations with flooding
Modifying the Prediction Model
- If you wish to populate the dataset with newer data, you can pull raw values from the USGS Fanno Creek Website
- Modify flood
flood_prediction_model.py
to your liking and seralize your modified model as a pickle file by runningpickle.dump(<model-variable-name>, open('<your-filename>.pkl','wb'))
- In
app.py
, de-serialize<your-filename>.pkl
by takingpickle.load(open('<your-filename>.pkl', 'rb'))
and you can now run the.predict()
method of your model on a dataset
Server and Data
About the server
- Floodopedia runs on Python's
Flask
library and uses REST API to make requests and responses between and within webpages. - Floodopedia is designed in such a way that each refresh fetches new data from the USGS Fanno Creek website (if any).
Web-scraping and data
- BeautifulSoup4 is used to scrape HTML text data from the USGS Fanno Creek Site. Floodopedia's design deliberately omits the use of a webdriver to bypass dependency issues on different machines and no external installation is needed.
- The variable
formatted_description
returns data in the form of 'Most recent instantaneous value: 15.7 05-28-2021 02:00 PDT' - Regular expressions are used to format the scraped data.
re.findall(r'[\d\.\d]+', formatted_description)[0]
returns raw data (i.e. 15.7), while(re.findall(r'((0[1-9]|1[0-2])\-(0[1-9]|1\d|2\d|3[01])\-(19|20)\d{2}\s\s([0-1]?[0-9]|2[0-3]):[0-5][0-9]\s([P][D][T])\s)$', formatted_description))[0][0]
returns date, time and timezone (i.e. 05-28-2021 02:00 PDT)