VisARTM

VisARTM is intended to become a successor of tm_navigator, a tool for visualizing and assessing Topic Models primarily built using BigARTM - fast and scalable library for Topic Modelling.

Installation and setup

VisARTM uses Python 3. While VisARTM is likely to work with Python 2, it is not guaranteed.

pip install -r requirements.txt before using VisARTM. VisARTM requiers fairly recent flask and flask_sqlalchemy.

Data format in VisARTM

All files required by VisARTM should be provided in .csv format. See columns and sample values for each input file below.

Files related to dataset

document.csv

id	abstract	content
0	document-0	abstact-0
1	document-1	abstact-1

term.csv

id	text
0	milk
1	Python

document_similarity.csv

document_l_id	document_r_id	similarity
0	1	0.5
0	2	0.2

term_similarity.csv

term_l_id	term_r_id	similarity
0	1	0.5
0	2	0.6

document_term.csv

document_id	term_id	count
0	0	100
0	1	0

Files related to topic model

topic.csv

id	title	probability	is_background
0	Topic 0	0.95	1
1	Topic 1	0.2	0

topic_similarity.csv

topic_l_id	topic_r_id	similarity
0	1	0.22
0	2	0.6

document_topic.csv

document_id	topic_id	prob_dt	prob_td
0	0	0.22	0.6
0	1	0.61	0.3

topic_term.csv

topic_id	term_id	prob_wt	prob_tw
0	0	0.22	0.6
0	1	0.4	0.2

Loading data into VisARTM

To generate some random data and see its visualization use ./setup_sample.py. This script generates some random data, writes everything to data subfolder and adds generated data to VisARTM database.

Generating VisARTM-compatible models from BigARTM models would be supported in the future.

To load your custom model into VisARTM do the following:

Put data files in appropriate format into a folder.
Call clear() and create() to ensure that project database is cleared from everything.
Call following Python functions from manage.py:

add_dataset('Your Dataset Name', 'path_to_dataset') - this creates dataset-related entries in the database and loads data.
add_topic_model('Your Topic Model name', 'data', created_dataset_id) where created_dataset_id is the id of added dataset returned from previous point

Good job! Now you're all set. Do python3 serve.py to see the loaded model and begin assessment.

kirillbobyrev/VisARTM