πŸ“ˆ νΌμŠ€νŠΈνŽ­κ·„

Keyword Extraction taskλ₯Ό μ΄μš©ν•œ KOSPI ν‚€μ›Œλ“œ μΆ”μΆœ 및 KOSPI index 예츑


Table of Contents

  1. Archive
  2. Team
  3. Process
  4. Demo
  5. Data
  6. Model
  7. How To Use

0. Archive

πŸ“Ή λ°œν‘œ μ˜μƒ
πŸ“„ λ°œν‘œ 자료


1. Team

Members

κ³ μš°μ§„_T4006 κΉ€μƒμœ€_T4036 ν˜„μŠΉμ—½_T4231

Contribution

Member Contribution
κ³ μš°μ§„(PM) λ…Όλ¬Έ 쑰사, Embedding model κ΅¬ν˜„ 및 ν•™μŠ΅, Price prediction model κ΅¬ν˜„ 및 ν•™μŠ΅
κΉ€μƒμœ€ 데이터 ꡬ좕 및 처리, Embedding model κ΅¬ν˜„ 및 ν•™μŠ΅, Demo μ œμž‘, Batch serving ꡬ좕
ν˜„μŠΉμ—½ λ…Όλ¬Έ 쑰사, 데이터 EDA, κ²€μƒ‰λŸ‰ 데이터 μˆ˜μ§‘, Embedding model κ΅¬ν˜„ 및 ν•™μŠ΅


2. Process



3. Data

Namuwiki Text : huggingface에 μ—…λ‘œλ“œλ˜μ–΄ μžˆλŠ” λ€ν”„νŒŒμΌ 이용

Seed keyword : 톡계청 제곡 κ²½μ œν‚€μ›Œλ“œ, λ…Όλ¬Έ, ꡬ글링을 톡해 KOSPI와 μ—°κ΄€μ„± 높은 ν‚€μ›Œλ“œ 지정

넀이버 κ²€μƒ‰λŸ‰ : 넀이버 Developers λ°μ΄ν„°λž© API μ΄μš©ν•˜μ—¬ μˆ˜μ§‘

KOSPI index : μ•Όν›„ νŒŒμ΄λ‚ΈμŠ€μ—μ„œ μ œκ³΅ν•˜λŠ” KOSPI(μ½”λ“œ : ^KS11) - yfinance 라이브러리 ν™œμš©ν•˜μ—¬ μˆ˜μ§‘



4. Model

For Text Embedding

KLUE RoBERTa large (Link)

RoBERTa λͺ¨λΈμ„ ν•œκ΅­μ–΄ 데이터(KLUE)λ₯Ό μ΄μš©ν•΄ pre-trainingν•œ μ–Έμ–΄ λͺ¨λΈ

KPF-BERT (Link)

ν•œκ΅­μ–Έλ‘ μ§„ν₯μž¬λ‹¨μ—μ„œ κ΅¬μΆ•ν•œ 20λ…„μΉ˜μ— λ‹¬ν•˜λŠ” μ•½ 4천만 건의 λ‰΄μŠ€κΈ°μ‚¬ 데이터λ₯Ό μ΄μš©ν•΄ ν•™μŠ΅ν•œ λͺ¨λΈ

KB-ALBERT (Link)

κ΅¬κΈ€μ˜ ALBERT에 경제/금육 도메인에 νŠΉν™”λœ λŒ€λŸ‰μ˜ ν•œκ΅­μ–΄ 데이터λ₯Ό ν•™μŠ΅μ‹œν‚¨ λͺ¨λΈ


For Predicting KOSPI index

LSTM



5. Demo

μ„œλΉ„μŠ€ ꡬ쑰

πŸ–₯️ Web μ˜ˆμ‹œ(Streamlit)


6. How to Use

File Directory

β”œβ”€β”€ codes
β”‚   β”œβ”€β”€ corr_given_time.py
β”‚   β”œβ”€β”€ get_anual.py
β”‚   └── inference_price.py
β”œβ”€β”€ dags
β”‚   └── operator_dag.py
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ 2016keyword.csv
β”‚   β”œβ”€β”€ 2017keyword.csv
β”‚   β”œβ”€β”€ 2018keyword.csv
β”‚   β”œβ”€β”€ 2019keyword.csv
β”‚   β”œβ”€β”€ 2020keyword.csv
β”‚   β”œβ”€β”€ 2021keyword.csv
β”‚   β”œβ”€β”€ 2022keyword.csv
β”‚   β”œβ”€β”€ ensemble_tomorrow_price.txt
β”‚   β”œβ”€β”€ final_candi_list.csv
β”‚   β”œβ”€β”€ final_candi_search_volume.json
β”‚   └── predict_past.csv
β”œβ”€β”€ pages
β”‚   β”œβ”€β”€ get_keywords.py
β”‚   └── price_inference.py
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
β”œβ”€β”€ main.py
└── requirements.txt

κ°€μƒν™˜κ²½

# κ°€μƒν™˜κ²½ 생성
python3 -m venv $ENV_NAME
# κ°€μƒν™˜κ²½ ν™œμ„±ν™”
source $ENV_NAME/bin/activate
# 라이브러리 μ„€μΉ˜
pip3 install --upgrade pip
pip3 install -r requirements.txt
# κ°€μƒν™˜κ²½ μ’…λ£Œ
deactivate

Streamlit

streamlit run main.py

Airflow

# μ ˆλŒ€κ²½λ‘œλ‘œ κΈ°λ³Έ 디렉토리 지정
export AIRFLOW_HOME=~/nlp02
# airflow DB μ΄ˆκΈ°ν™” -> κΈ°λ³Έ 파일 생성
airflow db init
airflow users create --username admin --password 1234 --firstname boocam --lastname kim --role Admin --email xxx@naver.com
airflow webserver --port 8080

# μŠ€μΌ€μ€„λŸ¬ μ‹€ν–‰
export AIRFLOW_HOME=~/nlp02
airflow scheduler