This repository contains solution for HKUST-UBS 'China Startup Prediction' Project from Team (Andrew and Zhiyun).
conda create -n startup_prediction python=3.6
source activate startup_prediction
pip install -r requirements.txt
To train and evaluate model, run:
cd src/model
python <INSERT FILENAME>
For example, to train and evaluate lightGBM model, run:
cd src/model
python multi_lightgbm.py
Some features extracted are:
- Company overview
- includes sector, management team, location
- Funding event
- includes normalized funding amount, funding round code, number of investors
- Investor
- includes top investors
- News
- includes positive and negative sentiment of news
The models implemented are as follows:
- LightGBM
- as baseline model
- Temporal Convolutional Network (TCN)
- with dilations, causal network and skip connections
- Codes are separated into three main modules:
data_loader
,model
,data
. - Subections in
docs
specifically addresses the criterias of the evaluation rubics.- For
Design - quality of background research
, please refer toresources.md
andmodel_description.md
- For
Code - organization of code
, please refer topreprocessing_file_organization.jpeg
andfile_description.md
- For
The following results are computed as precision, recall and f1-score for test set:
- LightGBM
- precision: 0.7652
- recall: 0.5787
- f1-score: 0.6076
- Temporal Convolutional Network (TCN)
- precision: 0.8495
- recall: 0.8539
- f1-score: 0.8517
- Experiment with additional graph propagation layer (similar to Graph Convolutional Network[1]) added between TCN block for feature propagation
- Finetune TCN
[1] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks.”