text-mining-codes-in-projects-and-lectures

text mining codes (POS-tagging, embedding, parsing, modeling ...)

01_2017spring_Pattern Recognition lecture team project

2017년 1학기 고려대학교 뇌공학과 석흥일 교수님의 패턴인식 수업에서 팀프로젝트로 사용한 코드입니다.
code
- 01_make_inputdata.py:
  - input data에 대해 전처리(tokenization, lemmatize, stopwords elimination)
  - 데이터셋 구성(TF-IDF, LDA의 문서당 토픽의 분포, Doc2Vec)
- 02_model.py: 분류모형 적합(Naïve Bayesian, Decision Tree, simple Ensemble) 및 성능 비교
data
- input data:
  - economic_news_article
  - ohsumed (용량 문제로 압축해서 업로드)
  - reuter * TOTAL: output이 저장될 폴더. 코드를 돌리면 이 폴더에 output이 저장됨
document
- [Team 5]_Final_Report.pdf: 최종 과제 결과물로 제출한 파일