Utility Outage Management Systems have been rolling out new smart grid technologies to supplement traditional methods for detecting and reporting on power outages. However, the new technologies will not be completely rolled out until 2030. While this new technology is implemented, supplemental efforts will be needed to help identify and provide social context on power outages.
This project is an effort to spatially and temporally understand social media surrounding power outage events. We assess Twitter to be the most viable means of analyzing power outage events using social media. We couple sentiment analysis with spatial and temporal analaysis in an interactive dashboard to provide insight into power outage events. Our methods, findings, and recommendations are detailed below.
We scraped and examined 18,000 tweets using the GetOldTweets3 API. We pulled these tweets from January, July and October 2019 by searching for tweets containing the phrase "power outage".
Techniques included Standard Exploratory Data Analysis (EDA). UTF-8 characters and emojis were removed, leaving only the text of tweets and several other metadata fields. See the EDA and Data Collection notebooks for furter details. The sampled data was not temporally consistent due the limitations of the API.
The most prominent issue of feature engineering was the lack of true geocoordinates, which hampered the ability to create a vialbe proof of concept. We generated synthetic geometries inside of the WMA (approximately 5500 sq miles), and joined these geometries to the 18,000 tweets. This issue can be mitigated with premium Twitter API access.
- Purchase Premium Twitter API (Access starts at $100/month, up to $2,500/month)
- Scrape twitter with given parameters for outages every X minutes
- Connect Tableau dashboard to the the dynamic data, once pulled
- Display the data dynamically on local Tableau instance or Tableau server:
- Tableau is dynamic, easy to use, and ubiquitous in terms of software solutions.
- Automate this process with the Premium API and an Extract, Transform, Load (ETL) process:
- Premium access will allow the script to run more frequently (near real-time).
- Write the data to a database instead of a CSV.
- Point the Tableau connection to that database instead of a CSV.
- Publish the Tableau dashboard to the web.
- Identifying true spatial clusters within data. This could provide insight into repeatedly problematic areas, ultimately better-serving customers.
- Refine sentiment analysis and consider larger query list for relevant terms related to power outage and investigate cosine similarity of these terms uising Word2Vec Python Library.
- Compare the data from this analysis to actual utility grid-outage data and investigate discrepencies.
-
- twitter-eric.ipynb
- twitter-soueidan.ipynb
-
- EDA-Client-Project-5.ipynb
- EDA-for-Client-Project.ipynb
-
- feature_engineering.ipynb
-
- sentiment_analysis.ipynb
-
- Image 5-15-20 at 5.26 PM.jpeg
-
- Analyzing Power Outages with Social Media.pdf
-
-
clean
- clean.csv
-
unclean
- initial test scrape
- outage-5k-tweets.csv
- power outage-5k-tweets copy.csv
- power outage-5k-tweets.csv
- power outage-10k-tweets.csv
- january
- january-early_after_tweets.csv
- january-end_before_tweets.csv
- january-mid_during_tweets.csv
- july
- july-early_after_tweets.csv
- july-end_before_tweets.csv
- july-mid_during_tweets.csv
- ne bomb cyclone data top tweets
- ne_bomb_cyclone_boston_after_tweets.csv
- ne_bomb_cyclone_boston_before_tweets copy.csv
- ne_bomb_cyclone_boston_before_tweets.csv
- ne_bomb_cyclone_boston_during_tweets.csv
- ne bomb cyclone data w/o location
- ne_bomb_cyclone_after_tweets.csv
- ne_bomb_cyclone_before_tweets.csv
- ne_bomb_cyclone_during_tweets.csv
- initial test scrape
-
merged
- merged_data_unclean.csv
- tweets_geom_unclean.csv
- tweets_sa_unclean.csv
- clean.ipynb
-
spatial_data
- 20_random_pts.csv
- 20190109_tableau.csv
- DC_Quadrants-shp.shp
- random_pts.csv
- tl_2017_us_cbsa.shp
- wma_gpkg.gpkg
-
Data | Meaning | Type |
---|---|---|
Event | Name of data pull (primary) | object |
Stage | Name of data pull (secondary) | object |
Query Date | Date from which query works backward | object |
Query Term | Term which query is pulled | object |
Id | Unique ID assigned per tweet | int64 |
Username | Twitter handle | object |
Text | Tweet text | object |
Date | Datetime stamp tweet was posted | object |
Hashtags | Hashtags associated with tweet | object |
Location | Hidden geodata from tweet (location ID) | float64 |
_wkt_geom | Geometry Generated by QGIS sofware | int64 |
id | Unique Spatial Index | int64 |
xcoord | Randomly Generated Longitude | float64 |
ycoord | Randomly Generated Latitude | float64 |
Sentiment | Sentiment Score | float64 |