piskvorky/gensim

EnsembleLDA with pyLDAvis visualisation

daniau23 opened this issue · 0 comments

Problem description

I am trying to print the topics from using the Ensemble approach of Gensim EnsembleLda and also visualise the results using pyLDAvis

Code

lda_model_six = EnsembleLda(
                    corpus=bow_corpus,
                    num_models=3,
                    random_state=42,
                    distance_workers =2,
                    num_topics=6,
                    chunksize=100,
                    passes=300,
                    iterations=500,
                    eval_every=None
                    )
# Print topics 
lda_model_six.print_topics(num_words=50)

# Visualise topics
pyLDAvis.enable_notebook()
vis_data_six = gensimvis.prepare(
    lda_model_six,bow_corpus,
    id2word,sort_topics=False
)
pyLDAvis.save_html(vis_data_six,'topic_visuals_bigrams/ensembles/six_topics.html')
pyLDAvis.display(vis_data_six)

Expected result

  1. Expected result should look like similiar to this when lda_model_six.print_topics(num_words=50) is ran

[(0,
'0.033*"fly" + 0.027*"airline" + 0.023*"service" + 0.019*"british_airways" + 0.013*"economy" + 0.012*"no" + 0.010*"seat" + 0.010*"price" + 0.010*"or" + 0.010*"pay" + 0.010*"food" + 0.010*"route" + 0.009*"time" + 0.009*"bad" + 0.009*"year" + 0.008*"well" + 0.008*"class" + 0.007*"experience" + 0.007*"london_heathrow" + 0.007*"like" + 0.007*"aircraft" + 0.007*"offer" + 0.007*"london" + 0.007*"customer" + 0.007*"british_airway" + 0.007*"premium" + 0.006*"long_haul" + 0.006*"carrier" + 0.006*"business_class" + 0.006*"cabin" + 0.006*"old" + 0.006*"poor" + 0.005*"don" + 0.005*"good" + 0.005*"lgw" + 0.005*"new" + 0.005*"think" + 0.005*"expect" + 0.005*"staff" + 0.005*"use" + 0.005*"charge" + 0.005*"business" + 0.005*"feel" + 0.005*"free" + 0.005*"travel" + 0.004*"trip" + 0.004*"fare" + 0.004*"ticket" + 0.004*"ve" + 0.004*"far"'),

But instead got this :
[(0,
'0.025*"81" + 0.025*"67" + 0.023*"110" + 0.020*"31" + 0.019*"87" + 0.017*"95" + 0.015*"122" + 0.015*"16" + 0.012*"75" + 0.012*"160" + 0.011*"1" + 0.010*"123" + 0.010*"700" + 0.010*"106" + 0.010*"398" + 0.010*"73" + 0.009*"738" + 0.009*"88" + 0.009*"45" + 0.009*"29" + 0.009*"108" + 0.008*"856" + 0.008*"102" + 0.008*"282" + 0.008*"94" + 0.008*"9" + 0.008*"30" + 0.008*"444" + 0.007*"6" + 0.007*"11" + 0.007*"154" + 0.007*"316" + 0.007*"127" + 0.007*"288" + 0.007*"866" + 0.006*"269" + 0.006*"352" + 0.006*"44" + 0.006*"509" + 0.006*"147" + 0.006*"717" + 0.005*"84" + 0.005*"79" + 0.005*"32" + 0.005*"612" + 0.005*"634" + 0.005*"650" + 0.005*"38" + 0.005*"161" + 0.005*"404"'),

  1. Expected result should look similar to this when pyLDAvis.display(vis_data_six) is ran
    Screenshot 2023-11-01 145252

But an error is given as AttributeError: 'EnsembleLda' object has no attribute 'num_topics'

Python package versions
scikit-learn==1.3.0
spacy==3.6.0
pandas==2.0.3
numpy==1.25.1
scipy==1.11.1
matplotlib==3.7.2
gensim==4.3.0
nltk==3.8.1
pyldavis==3.4.1