/level2_klue_nlp-level2-nlp-05

level2_klue_nlp-level2-nlp-05 created by GitHub Classroom

Primary LanguageJupyter NotebookMIT LicenseMIT

πŸ… KLUE Competition - Relation Extraciton

πŸ“‹ Table of contents


πŸ“ Competition Description

관계 μΆ”μΆœ(Relation Extraction)은 λ¬Έμž₯의 단어(Entity)에 λŒ€ν•œ 속성과 관계λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” λ¬Έμ œμž…λ‹ˆλ‹€.

이번 λŒ€νšŒμ—μ„œλŠ” λ¬Έμž₯, 단어에 λŒ€ν•œ 정보λ₯Ό 톡해 λ¬Έμž₯ μ†μ—μ„œ 단어 μ‚¬μ΄μ˜ 관계λ₯Ό μΆ”λ‘ ν•˜λŠ” λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚΅λ‹ˆλ‹€. 이λ₯Ό 톡해 우리의 인곡지λŠ₯ λͺ¨λΈμ΄ λ‹¨μ–΄λ“€μ˜ 속성과 관계λ₯Ό νŒŒμ•…ν•˜λ©° κ°œλ…μ„ ν•™μŠ΅ν•  수 μžˆμŠ΅λ‹ˆλ‹€.



πŸ’Ύ Dataset Description

Dataset train test
λ¬Έμž₯ 수 32470 7765
λΉ„μœ¨ 80 20

Columns

  • id (λ¬Έμžμ—΄) : λ¬Έμž₯ 고유 ID

  • sentence (λ¬Έμžμ—΄) : 주어진 λ¬Έμž₯

  • subject_entity (λ”•μ…”λ„ˆλ¦¬) : 주체 entity

  • object_entity (λ”•μ…”λ„ˆλ¦¬) : 객체 entity

  • label : (λ¬Έμžμ—΄) 30가지 label에 ν•΄λ‹Ήν•˜λŠ” 주체와 객체간 관계

  • source : (λ¬Έμžμ—΄) λ¬Έμž₯의 좜처

    • wikipedia (μœ„ν‚€ν”Όλ””μ•„)

    • wikitree (μœ„ν‚€νŠΈλ¦¬)

    • policy_briefing (μ •μ±… 보도 자료?)



πŸ—„ Folder Structure

β”œβ”€β”€πŸ“config
β”‚   └── base_config.yaml
β”‚   └── custom_config.yaml 
β”‚
β”œβ”€β”€πŸ“data_loaders
β”‚   └── data_loader.py  β†’ 데이터셋을 λ‘œλ“œν•©λ‹ˆλ‹€. 
β”‚   └── preprocessing.py
β”‚
β”œβ”€β”€πŸ“dataset
β”‚   β”œβ”€β”€πŸ“dev
β”‚   β”‚   └── dev.csv β†’ dev(valid) 데이터
β”‚   β”œβ”€β”€πŸ“predict
β”‚   β”‚   β”œβ”€β”€ predict.csv β†’ μ˜ˆμΈ‘ν•΄μ•Όν•˜λŠ” 데이터
β”‚   β”‚   └── sample_submission.csv β†’ μƒ˜ν”Œ 데이터
β”‚   β”œβ”€β”€πŸ“pretrain
β”‚   β”‚   β”œβ”€β”€ all_data.csv β†’ train + test 데이터
β”‚   β”‚   └── train.csv
β”‚   β”œβ”€β”€πŸ“test
β”‚   β”‚   └── test.csv β†’ λͺ¨λΈ ν•™μŠ΅ ν›„ λ§ˆμ§€λ§‰ ν‰κ°€μ—μ„œ μ‚¬μš©ν•˜λŠ” 데이터
β”‚   β””β”€β”€πŸ“train
β”‚       └── train.csv β†’ ν•™μŠ΅ 데이터
|       └── gpt_autmentation, roberta_augmentation, pororo_augmentation.csv
β”‚
β”œβ”€β”€πŸ“model
β”‚   β”œβ”€β”€ auxiliary.py
β”‚   β”œβ”€β”€ entity_roberta.py
β”‚   β”œβ”€β”€ loss.py
β”‚   β”œβ”€β”€ lstm.py
β”‚   β”œβ”€β”€ metric.py 
β”‚   β”œβ”€β”€ model.py
β”‚   β”œβ”€β”€ rbert.py
β”‚   └── recent.py
β”‚
β”œβ”€β”€πŸ“prediction
β”‚   β”œβ”€β”€ sample_submission.csv
β”‚   β”œβ”€β”€ submission.csv
β”‚   └── submission_18-14-46.csv β†’ inferenceν•˜λŠ” 경우, 'λ‚ μ§œ-μ‹œκ°„-λΆ„.csv'κ°€ 뒀에 λΆ™μŒ
β”‚
β”œβ”€β”€πŸ“step_saved_model β†’ save_steps μ‘°κ±΄μ—μ„œ λͺ¨λΈμ΄ μ €μž₯λ˜λŠ” 경둜.
β”‚   β””β”€β”€πŸ“klue-roberta-large β†’ μ‚¬μš©ν•œ λͺ¨λΈ
β”‚       β””β”€β”€πŸ“18-14-42       β†’ μ‹€ν–‰ν•œ λ‚ μ§œ-μ‹œκ°„-λΆ„
β”‚           β””β”€β”€πŸ“checkpoint-500 β†’ μ €μž₯된 체크포인트-μŠ€νƒ­
β”‚ 
β”œβ”€β”€πŸ“trainer
β”‚   └── trainer.py
β”‚
β””β”€β”€πŸ“utils
β”‚    └── util.py             
β”‚
β”œβ”€β”€ dict_label_to_num.pkl
β”œβ”€β”€ dict_num_to_label.pkl
β”œβ”€β”€ inference.py β†’ inference μ½”λ“œ
β”‚
β”œβ”€β”€ main.py β†’ train.py와 inference.py μ‹€ν–‰ μ½”λ“œ
β”‚   ex) trainν•˜λŠ” 경우 β†’ python main.py -mt
β”‚       inferenceν•˜λŠ” 경우 β†’ python main.py -mi
β”‚  
β”œβ”€β”€ tapt_pretrain.py β†’ tapt task μ½”λ“œ
β”œβ”€β”€ train.py β†’ train μ½”λ“œ
β”œβ”€β”€ train_ray.py β†’ hyperparameter search μ½”λ“œ
└── train_raybohb.py




βš™οΈ Set up

1. Requirements

$ pip install -r requirements.txt

2. Prepare Dataset - train data split

train : dev : test = 8 : 1 : 1



πŸ’» How to Run

How to Train

$ python main.py  -mt

How to Inference

$ python main.py  -mi

How to TAPT pretrain

$ python main.py  -mtp