kwartler/text_mining

match.matrix function page 187-188

Closed this issue · 2 comments

In the match.matrix function, there are three closed curly brackets "}" but only two open curly brackets "{", which leads to the function not working.

Since there is an if-statement that does not have an open {, I tried putting the missing bracket there; to no avail:

if (attr(original.matrix, "weighting")[2] == "tfidf") {
...
matrix <– fixed
}

Where does the missing bracket have to go?

I looked at the original code from my proposal and it didn't have an issue. I will look at the hard copy next. In the meantime, here is functioning code.

match.matrix <- function(text.col, original.matrix=NULL, weighting=weightTf) {
  control <- list(weighting=weighting)
  training.col <- 
    sapply(as.vector(text.col,mode="character"),iconv,to="UTF8",sub="byte")
  corpus <- VCorpus(VectorSource(training.col))
  matrix <- DocumentTermMatrix(corpus,control=control);
  
  if (!is.null(original.matrix)) {
    terms <- 
      colnames(original.matrix[,which(!colnames(original.matrix) %in% colnames(matrix))])
    weight <- 0
    if (attr(original.matrix,"weighting")[2] =="tf-idf") weight <- 0.000000001
    amat <- matrix(weight,nrow=nrow(matrix),ncol=length(terms))
    colnames(amat) <- terms
    rownames(amat) <- rownames(matrix)
    
    fixed <- as.DocumentTermMatrix(
      cbind(matrix[,which(colnames(matrix) %in% colnames(original.matrix))],amat),
      weighting=weighting)
    
    matrix <- fixed
  }
  
  matrix <- matrix[,sort(colnames(matrix))]
  gc()
  return(matrix)
}

The specific line you are referring to established the weight parameter as a miniscule number and is written below as long as the logical operation evaluates to TRUE.

if (attr(original.matrix,"weighting")[2] =="tf-idf") weight <- 0.000000001

In your example you opened the bracket of this line as

if (attr(original.matrix, "weighting")[2] == "tfidf") {

However, the below if statements all evaluate the same in R.

# Establish a logical condition
chk<-1

# No brackets w/line break
if(chk==1) 
  print('hello world')

# No brackets, no line break
if(chk==1) print('hello world')

# Brackets with no line break 
if(chk==1){ print('hello world')}

# Brackets with line breaks 
if(chk==1) { 
print('hello world')
}

So I think your line should look something like this:

if (attr(original.matrix,"weighting")[2] =="tf-idf") {weight <- 0.000000001}

Admittedly, it's inconsistent coding on my part to neglect the { and } of that line. I will close the issue as I believe it has been resolved but reopen I misdiagnosed it.