/sbdc

Segment-based document clustering pipeline

Primary LanguagePythonApache License 2.0Apache-2.0

SBDC: Segment-based Document Clustering

Build status Coverage Status

Dependencies

  • NumPy, 1.7+

References

  • Andrea Tagarelli and George Karypis. A Segment-based Approach To Clustering Multi-Topic Documents. In "Knowledge and Information Systems", Vol. 34 (2013), No. 3, pp. 563-595

  • Thorsten Brants, Francine Chen and Ioannis Tsochantaridis Topic-based document segmentation with probabilistic latent semantic analysis. In "Proceedings of the eleventh international conference on Information and knowledge management", pp. 211-218, ACM (2002)