Apply texttiling method for topic segmentation to Korean text data.
Imporvement texttiling method by using sentence embedding with pretrained SBERT models and setting cosine similarities to similarity scores.
Execute only korean_texttiling.ipynb for the result.
Korean text data in the texts folder from AIHUB
Preprocessed by fetch_data.ipynb
link : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=543
Modified nltk's texttiligng source code for Korean text and improvement. link : https://www.nltk.org/_modules/nltk/tokenize/texttiling.html
nltk's segmentation source code for evaluation metric. link : https://www.nltk.org/_modules/nltk/metrics/segmentation.html
Used to load data from AIHUB and preprocess data.
Used to expriment