/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

TGNews

Build Status

Demo

Install

Prerequisites: CMake, Boost

$ sudo apt-get install cmake libboost-all-dev build-essential

If you got zip archive, just go to building binary

To download code and models:

$ git clone https://github.com/IlyaGusev/tgcontest
$ cd tgcontest
$ git submodule init
$ git submodule update
$ bash download_models.sh

To build binary (in "tgcontest" dir):

$ mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=Release ..
$ make

To download datasets:

$ bash download_data.sh

Run on sample:

./build/tgnews top data --ndocs 10000

Training

Models

Data

Markup

Misc

Links

TODO:

  • Framework for complex NN
  • Proper clustering markup
  • Error analysis for categories classifiers
  • Alternatives for PageRank
  • "Ugly" titles