Cluster analysis based on RIA news headlines

It is very important what data is being analyzed. If you take tweets, then without preprocessing everything will be pretty sad.
Final summarisation does not work well. Tries to compose one from different news, instead of highlighting the essence of the collection. Instead of a sumarizer, it is better to use manual marking of the resulting categories with subsequent training of the classifier.

shitkov/cluster_analysis