twitter text preprocessing and bidirectional lstm for multivariate categorization
Author: Andrew Larkin
Affiliation: Oregon State University, College of Public Health and Human Sciences
Date Created: September 23rd, 2018
Summary
The purpose of this project is to download text from twitter posts related to urban nature, or "greenspace", preprocess the text for developing classifiers, create a multivariate classifier for greenspace related pathways of action, and evaluate model performance. Github contents are partitioned into the following folders:
- Documentation - background information, project purpose, project description, and data sources.
- Data Collection - scripts used to download data from Twitter and store in SQL database, along with results
- Data Preprocessing - NLP scripts to screen, clean, and standardize tweet text and metadata, along with results
- Model Training - Tensorflow scripts for training models, including hyperparamter tuning
- Model Evaluation - Tensorflow scripts for model evaluation, along with performance results
Referenes
- Andrew Larkin, Perry Hystad, Integrating Geospatial Data and Social Media in Bidirectional Long-Short Term Memory Models to Capture Human Nature Interactions, The Computer Journal, , bxaa094, https://doi.org/10.1093/comjnl/bxaa094