pharmaverse/tidytlg

Massive slowdown for large tables and listings

Closed this issue · 1 comments

Hi @kpagacz,

As discussed, I've analyzed the slowdown for large tables, and the culprit for tables comes down to the handling of inserting the blank row when the "newrows" column is 1 (ie on).

The code that does this is here:

tidytlg/R/gentlg_single.R

Lines 702 to 712 in caadd73

### New rows
if (!is.null(newrows)) {
newrows <- newrows[order(newrows)]
newrows <- newrows + formatindex - 1
ht2 <- ht[formatindex, ]
ht2[1, ] <- ""
for (i in seq_along(newrows)) {
line <- length(newrows[newrows < newrows[i]]) + newrows[i]
ht <- huxtable::add_rows(ht, ht2, after = line)
}
}

Specifically, for large tables, add_rows is being called potentially dozens or hundreds of times, separately, and each time, the entire new huxtable needs to have its row spans checked (even though we don't have any row spans, huxtable objects support them), which ends up taking 90+% of total runtime for large tables. In particular, the arrange_spans function in huxtable.

The workaround is to never let that newrows column be non-zero by inserting blank lines ourselves before the conversion to a huxtable object:

So my code for the workaround looks something like the (untested, as my code is happening in a slightly different context, so may be typos):

chunkfact <- cumsum(df$newrows)

chunks <- split(df, chunkfact)

## figure out which ones are data columns and which are  tbl file metacolumns
datcols <- names(df)[!names(df) %in% c("row_type", "anbr", "indentme", "roworder", "newrows")
emptyrow <- df[NA,]
emptyrow[,datcols] <- ""
emptyrow$row_type = "EMPTY"
emptyrow[,c("indentme", "roworder", "newrows")] <- 0

chunklst <- lapply(chunks, function(chi) {
      emptyi <- emptyrow
      emptyi$anbr <- chi$anbr[1]
     rbind(chi, emptyi)
})

fixedf <- do.call(rbind, chunklst)
fixedf$roworder <- seq_len(nrow(fixedf))

Large listings are also doing the same thing somewhere else (as illustrated by the same 90%+ of time taken in arrange_spans) but I don't think its in the same place, I'll leave tracking that one down to you :)

Thanks, I'll dig into this on Monday.