att/rcloud.solr

index_all should use skip encrypted notebooks, and b64-encoded "raw" source

Opened this issue · 4 comments

Below is the notebook @s-u was most currently using for indexing, because we forgot that you had implemented index_all.

We need to skip encrypted notebooks, and we also need to specify raw=TRUE when getting the notebook so that it stays b64 encoded.

## force synchronous updates
rcloud.support:::setConf("solr.post.method", "sync")

index <- function (id) {
    group <- rcloud.get.notebook.cryptgroup(id)
    if (is.null(group)) { # don't index private/encrypted notebooks
        star.count <- rcloud.notebook.star.count(id)
        rcloud.solr::update_solr(rcloud.get.notebook(id, raw=TRUE), star.count)
    }
}

all.nb <- rcloud.config.all.notebooks.multiple.users(rcloud.get.users())
u <- unique(unlist(all.nb))
length(u)

# for (id in u)) tryCatch(index(id), error=function(e) warning(id," failed: ", as.character(e))) 
for (i in seq.int(length(u))) { id <- u[i]; tryCatch(index(id), error=function(e) rcloud.html.out("<font color=red>failed: ",id," </font>",paste(as.character(e), collapse=' '),"<br>")); rcloud.html.out(paste0(i,", ")); Sys.sleep(0.1) } 

For encrypted I've been using the rcloud.support:::is.notebook.encrypted and for private rcloud.support::rcloud.is.notebook.visible. Is it best to use cryptogroup?

They do the same thing - @s-u is just using a lower-level API in his notebook. And you are correct that we should also check if the notebook is visible - good catch!

I didn't see any reference to those in index_all.R, which is why I filed the issue. Please ignore if I missed it somehow.

I included his code just for reference, not that it should be followed exactly.

index_all used to be a wrapper around update_solr which does check for encrypted/private. However, I'm changing it to handle batch so it'll need its own checks. So definitely worth a reminder!

Ah, got it. Makes sense!