/Bigcon

Bigcontest code

Primary LanguageJupyter NotebookMIT LicenseMIT

앱 사용성 데이터를 통한 대출신청 예측 분석

Overview

image

Results

  • The results for the forecast after June can be found at the following Google Drive link: Google Drive

  • test.csv == 데이터분석분야_퓨처스부분_이용재와아이들_평가데이터.csv

  • cluster_user.csv

  • If you want the final result file (test.csv) to create the same application id, product id as the test set for submission, run the Jupiter notebook as follows. 9_select_the_submit_data.ipynb

Notice

You MUST Create the following folders in advance.

  • data
  • prepro_data
  • DL_dataset

The data folder requires the loan_result.csv, log_data.csv, and user_spec.csv files. Google Drive
When you run 1_preprocessing_real.ipynb, the preprocessed file is stored in the prepro_data folder in the following two forms:

  • full_data.csv
  • submit_test.csv

full_data.csv means a dataset for learning before June and a dataset for submit_test.csv means a dataset for testing after June.

Next, when 2_Preprocessing_2.ipynb and 3_Preprocessing_3.ipynb are runed, the data on the user's behavior is reflected in loon_result.csv. At this time, ray was used for parallel processing. The reflected results are similarly stored in the prepro_data folder as full_data.csv and submit_test.csv.

Thired, by runing 4_Preprocessing_4.ipynb, continuous variables are converted into categorical variables and stored in a dataset folder. The stored data sets are as follows.

  • full_data.csv
  • submit_test.csv

Finally, a dataset for deep learning is configured by executing 6_DL_models_inputs.ipynb. The following data is stored in the DL_dataset.

  • fold_0.csv
  • fold_1.csv
  • fold_2.csv
  • fold_3.csv
  • fold_4.csv
  • train.csv
  • test.csv

Through deep learning, all results are stored in DL_dataset.

Get Started

  1. Install Python 3.8. (i.g, Create environment conda create -n bigcon python=3.8)
  2. Download data. (first, you must be load data in ./data folder) 3-1. Download requirement packages pip install -r requirements.txt 3-2. Download Autogolun packages pip3 install autogluon you can use GPU mode See the link. 3-3. Download Ray package for preprocessing pip install ray
  3. For the preprocessing process, run five jupyer notes as follows.
  • 1_preprocessing_real.ipynb
  • 2_Preprocessing_2.ipynb
  • 3_Preprocessing_3.ipynb
  • 4_Preprocessing_4.ipynb
  • 6_DL_models_inputs.ipybn
  1. To run the ML model, run the following jupyter notebook. The weights of all models can be downloaded from the following Google drive Link : Google Drive
  • 5_test_modeling-ACC-ALL.ipynb
  1. Train the model. We provide the experiment scripts of all benchmarks under the folder ./runfile. The weights of Deep learning can be downloaded from the checkpoints.zip folder. You can reproduce the experiment results by:
bash ./runfile/big_1.sh
  1. For machine learning and deep learning models, run Voting enamble 7_ML_DL_model_output.ipybn. All results are stored in the submit folder.

  2. Run 8_Clustering.ipynb for clustering results. All results are stored in the submit folder

Results

  • The results for the forecast after June can be found at the following Google Drive link: Google Drive

  • test.csv == 데이터분석분야_퓨처스부분_이용재와아이들_평가데이터.csv

  • cluster_user.csv

  • If you want the final result file (test.csv) to create the same application id, product id as the test set for submission, run the Jupiter notebook as follows. 9_select_the_submit_data.ipynb

Models

  • We construct a final model with an ensemble of machine learning models and deep learning models.

image

image

Clustering

  • Clustering was conducted from the Embedding vector extracted from deep learning, and an evaluation index based on cumulative probability distribution was created to find the optimal cluster from the Embedding vector of high dimensions.

image

Contact

If you have any questions or want to use the code, please contact yoontae@unist.ac.kr.