cpsievert/LDAvis

question: how are the topic numbers put into the circles?

Closed this issue · 4 comments

Hello! I see that createJSON orders the topics in order of decreasing frequency to put the numbers in the circles. But does that correspond to the ordering of the columns in theta and/or the rows in phi? I ask because I am "zooming in" on topics recursively and running lda again to show subtopics (after pigeonhole-ing each document into only one topic). Right now, my code assumes the numbering of the circles corresponds to the ordering of the columns in theta, but I believe that's not true. Thanks! :)

Hi Sydney,

If you load the visualization and then you want to inspect a specific topic from the visualization back in R (by looking at either theta or phi, for example), you'll need to use the topic.order element of the output of createJSON().

For example, using the TwentyNewsgroups data (which is packaged with LDAvis), suppose you run:

data(TwentyNewsgroups, package="LDAvis")
json <- with(TwentyNewsgroups,
             createJSON(phi, theta, doc.length, vocab, term.frequency))
serVis(json) # press ESC or Ctrl-C to kill

And then you want to inspect Topic 28, whose five most probable terms are "space, nasa, earth, orbit, gov" (visible in LDAvis by setting labmda = 1 and looking at the uppermost 5 bars in the barchart).

You need to figure out which original row of phi (or column of theta) corresponds to Topic 28 in the visualization. This info is stored as the topic.order' element returned bycreateJSON()`:

# Wrong way to inspect Topic 28 from the vis:
TwentyNewsgroups$vocab[order(TwentyNewsgroups$phi[28, ], decreasing = TRUE)][1:5]
# [1] "writes" "edu"    "can"    "just"   "get"   

# Right way:
new.order <- RJSONIO::fromJSON(json)$topic.order
TwentyNewsgroups$vocab[order(TwentyNewsgroups$phi[28, ], decreasing = TRUE)][1:5]
# [1] "space" "nasa"  "earth" "orbit" "gov"  

To inspect/postprocess multiple topics back in R, it might be worthwhile to manually re-order all the columns of theta and the rows of phi so that they match the topic numbers from the visualization:

TwentyNewsgroups$phi <- TwentyNewsgroups$phi[new.order, ]
TwentyNewsgroups$theta <- TwentyNewsgroups$theta[, new.order]

Hope this makes sense,
kenny

Hi Kenny, thank you very much for the detailed response! I saw these two lines in the createJSON.R file

topic.frequency <- colSums(theta * doc.length) 
topic.proportion <- topic.frequency/sum(topic.frequency) 

And basically mapped the column number of theta to its topic number in the visualization. Thanks for the help! I really like LDAvis and the movies example is really good.

Will you update the LDAvis version on CRAN (on GitHub its 0.3.3, on CRAN it's 0.3.2)? I think the 0.3.2 version does not support reorder.topics parameter

Will you update the LDAvis version on CRAN (on GitHub its 0.3.3, on CRAN it's 0.3.2)? I think the 0.3.2 version does not support reorder.topics parameter