TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages (1997)](https://www.aclweb.org/anthology/J97-1003.pdf) |
|
A HIDDEN MARKOV MODEL APPROACH TO TEXT SEGMENTATION AND EVENT TRACKING(1998) |
|
Statistical Models for Text Segmentation(1999) |
|
Advances in Domain Independent Linear Text Segmentation(2000) |
- C99 알고리즘 |
Latent Semantic Analysis for Text Segmentation(2001) |
- LSA 사용 |
A Statistical Model for Domain-Independent Text Segmentation(2001) |
|
Minimum Cut Model for Spoken Lecture Segmentation(2006) |
|
Bayesian Unsupervised Topic Segmentation(2008) |
|
Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion(2009) |
|
Linear Text Segmentation using Affinity Propagation(2001) |
|
TopicTiling: A Text Segmentation Algorithm based on LDA(2012) |
|
Domain-Independent Unsupervised Text Segmentation for Data Management(2014) |
|
Text Segmentation based on Semantic Word Embeddings(2015) |
|
Unsupervised Text Segmentation Using Semantic Relatedness Graphs(2016) |
|
합성곱 신경망을 이용한 On-Line 주제 분리(2016) |
|
Text Segmentation as a Supervised Learning Task(2018) |
- text segmentation 위한 wiki dataset 만듦 - 기존에 unsupervised, probalistic하게 해결하던 task를 supervised하게 해결 |
Attention-based Neural Text Segmentation(2018) |
|
Scientific Literature Summarization Using Document Structure and Hierarchical Attention Model(2019) |
|
SECTOR: A Neural Model for Coherent Topic Segmentation and Classification(2019) |
|
LANGUAGE MODEL PRE-TRAINING FOR HIERARCHICAL DOCUMENT REPRESENTATIONS(2019) |
- text segmentation으로 실험 진행 |
BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification(2019) |
|
BTS: 한국어 BERT를 사용한 텍스트 세그멘테이션(2019) |
|
Context-Aware Latent Dirichlet Allocation for Topic Segmentation(2020) |
|
Chapter Captor: Text Segmentation in Novels(2020) |
1. 구텐버그 프로젝트에 포함된 소설을 이용해 text segmentation 데이터셋 구축 2. Local Method: * Weighted Overlap Cut(WOC): unsupervised, 각 챕터 내 빈번히 등장하는 단어가 다를것이라는 점에서 착안, 두 문장을 비교해 단어의 밀집도(overlap하는 경우)가 최소화 되는 곳을 Break point로 둠 * BERT for Break Prediction (BBP): supervised, 두 문장을 비교해 두 문장이 연속적인지(같은 챕터인지) 아니면 연속적이지 않은지(break point)를 분류 문제로 계산 3. Global Method using Optimization: segment의 길이를 일정하게 만드는 것이 좋은 segmentation 결과를 보여줌 * 동적 프로그래밍 기법을 사용해 recursive하게 해결 |
Books of Hours: the First Liturgical Corpus for Text Segmentation(2020) |
|
A Joint Model for Document Segmentation and Segment Labeling(2020) |
|
Discourse as a Function of Event: Profiling Discourse Structure in News Articles around the Main Event(2020) |
|
Improving BERT with Focal Loss for Paragraph Segmentation of Novels(2020) |
|
Topical Change Detection in Documents via Embeddings of Long Sequences(2020) |
|
Text Segmentation by Cross Segment Attention(2020) |
|