encoding problem with Chinese

Question

encoding problem with Chinese

Closed this issue 7 years ago · 3 comments

Hi, I'm trying to cluster text files written in Chinese.
tcm, and vocab were made with UTF-8.
However, plotting LDA model didn't work.
Does servr package provide work in UTF-8?
If it does, the problem would be in my codes....., I guess...
It would be very thankful if you see my codes and capture files.:

str(mylist)

it = itoken(mylist, ids = files.list, progressbar = FALSE)
v = create_vocabulary(it) %>% 
  prune_vocabulary(term_count_min = 10, doc_proportion_max = 0.2)
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer, type = "lda_c")

lda_model = 
  LDA$new(n_topics = 10, vocabulary = v, 
          doc_topic_prior = 0.1, topic_word_prior = 0.01)
doc_topic_distr = 
  lda_model$fit_transform(dtm, n_iter = 60, convergence_tol = 0.01, 
                          check_convergence_every_n = 10)

library(LDAvis)
library(servr)
lda_model$plot()

Answer 1 · 2017-02-04T06:03:00.000Z

Not sure if you have contacted the author of LDAvis @cpsievert, but before you bother him, please try to update R, all your R packages (update.packages(ask = FALSE)), and if the problem still persists after you have updated them, try:

devtools::install_github('rstudio/htmltools')

If this still does not fix the issue, please provide

devtools::session_info('servr')
devtools::session_info('htmltools')

Note every time before you install anything and retry, you should restart R.

Answer 2 · 2017-08-09T04:42:02.000Z

I figured out that json file was made as ANSI. I encoded it as utf-8 and it worked. Thanks:)

Answer 3 · 2017-08-09T04:51:58.000Z

Perfect. Thanks for posting back!