encoding problem with Chinese
Closed this issue · 3 comments
Hi, I'm trying to cluster text files written in Chinese.
tcm, and vocab were made with UTF-8.
However, plotting LDA model didn't work.
Does servr package provide work in UTF-8?
If it does, the problem would be in my codes....., I guess...
It would be very thankful if you see my codes and capture files.:
it = itoken(mylist, ids = files.list, progressbar = FALSE)
v = create_vocabulary(it) %>%
prune_vocabulary(term_count_min = 10, doc_proportion_max = 0.2)
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer, type = "lda_c")
lda_model =
LDA$new(n_topics = 10, vocabulary = v,
doc_topic_prior = 0.1, topic_word_prior = 0.01)
doc_topic_distr =
lda_model$fit_transform(dtm, n_iter = 60, convergence_tol = 0.01,
check_convergence_every_n = 10)
library(LDAvis)
library(servr)
lda_model$plot()
Not sure if you have contacted the author of LDAvis @cpsievert, but before you bother him, please try to update R, all your R packages (update.packages(ask = FALSE)
), and if the problem still persists after you have updated them, try:
devtools::install_github('rstudio/htmltools')
If this still does not fix the issue, please provide
devtools::session_info('servr')
devtools::session_info('htmltools')
Note every time before you install anything and retry, you should restart R.
I figured out that json file was made as ANSI. I encoded it as utf-8 and it worked. Thanks:)
Perfect. Thanks for posting back!