nlp_project

Objects

Apply texttiling method for topic segmentation to Korean text data.

Imporvement texttiling method by using sentence embedding with pretrained SBERT models and setting cosine similarities to similarity scores.

Execute only korean_texttiling.ipynb for the result.

Modified nltk's texttiligng source code for Korean text and improvement. link : https://www.nltk.org/_modules/nltk/tokenize/texttiling.html

nltk's segmentation source code for evaluation metric. link : https://www.nltk.org/_modules/nltk/metrics/segmentation.html

Used to load data from AIHUB and preprocess data.

Used to expriment