Given a dataset with tweets that indicate physical activities and the time when the tweet is posted, conduct a binary classification task that classifies tweets into two categories: 0 (tweeter did not engage in any physical activities) and 1 (tweeter engaged in physical activity(ies))
- Pre-processing:
- Regex
- TF-IDF
- Tokenization
- Stop-word removal
- Machine learning
- Support Vector Machine
- Random Forest
- XGBoost
- Ensemble methods (Stacking)
- Deep learning (transformers, BERT, and ELECTRA)
Given a dataset with reviews on the effectiveness and adverse effect of psychiatric medications (Zoloft, Lexapro, Cymbalta, and Effexor XR), train a conditional random field and deep learning model that extracts the adverse events and signs and symptoms from the setences.
- Machine learning:
- Conditional random field (pycrf)
- Deep learning:
- Bidirectional Long-short Term Memory (Tensorflow)
- Word embeddings (GloVe and BioWordVec)
- MetaMap 2018