[feature] decouple visualisation UI's topic numbering with their label
ed9w2in6 opened this issue · 2 comments
We have whole family of issues that are just about the numbering of topics during visualisation:
- numbering confusions
- rename topic feature requests
- bug due to incorrect indexing
They can all be resolved just by decoupling the numbering from labels, which also remove the need of sort_topics
, and start_index
options in the python API.
Now I am not going into details on how to implement or specification of outcomes, but here are some ideas:
Outline
python
API side
We currently generate topic numbers at topic_top_term_df
in _prepare.py
. We use enumerate and start_index
to generate the numbering, in which it is supplied by user from prepare
method, smuggled through _topic_info
method.
Line 276 in 16800f3
Sorting is orthogonal to this logic, hence we can safely ignored it when changing such code:
Lines 413 to 416 in 16800f3
The number generated from enumerate
will ultimately be used to name the topic, stored as Category
:
Line 265 in 16800f3
I believe we should allow user to supply a list of strings.
If we change this we need to change this too:
Lines 443 to 449 in 16800f3
and made sure none of them are named "Default"
, since we used it as default:
Lines 237 to 242 in 16800f3
And that is for topic_info
data only, we have to do the same of mdsData
and token_table
too.
Clearly a better way is just to side-step it and just supply a desired list of names and store into the PreparedData
namedtuple.
Solution: side step at JS visualisation side
Currently, our visualisation logic made hard assumptions that Category must be in the form of "TopicN"
where N
is a number:
pyLDAvis/pyLDAvis/js/ldavis.js
Lines 697 to 701 in 16800f3
Therefore, again, the path of lowest friction is to side-step it only changing the visualisation logic:
- RHS Table title
pyLDAvis/pyLDAvis/js/ldavis.js
Lines 982 to 987 in 16800f3
- circle label
pyLDAvis/pyLDAvis/js/ldavis.js
Lines 388 to 393 in 16800f3
In which 2
is optional. So only 3 changes in total!
Summary, changes needed
- new parameter for topic names
- store it at
PreparedData
- change RHS Table title, optionally the circle labels too
Are you creating a matching pull request?
@msusol Yes, still WIP though. Ideally cleaning up the code base would be better but I do not have such plans.
My plan is to just, as mentioned above, a quick hack:
- adding new param at
prepare
, default to None, some logic to generate dummy topic name if None. - store it at
PreparedData
- change the visualisation accordingly:
- RHS Table title
- the circle labels too if it looked good.
- allow select topic by topic name too, if not too difficult