ModelOriented/iBreakDown

Code review for v 1.0.0

pbiecek opened this issue · 3 comments

What to check:

  • is documentation sufficient to understand function's parameters
  • is the description in DESCRIPTION up to date and meaningful
  • are examples easy to understand and follow
  • are descriptions in vignettes easy to understand and follow
  • is the README.md file easy to understand, consistent with other descriptions
  • is the R code easy to understand and follow
  • R code should be modular (long functions are bad)
  • are function names and variable names consistent

Comments for Thursday

  • Link to D3 example in README doesn't work

  • it would be good to change names of files in R folder, they should be consistent wit names of functions inside them

  • unmeaningful name of the variable - tmp. How about path or something similar? I also think that this code should be a separate function, e.g. make_path() or create_path()

    iBreakDown/R/break_down.R

    Lines 113 to 127 in 1096fa4

    tmp <- data.frame(diff = diffs_1d,
    ind1 = 1:p)
    # how variables shall be ordered in the BD plot?
    if (is.null(order)) {
    # sort impacts and look for most importants elements
    tmp <- tmp[order(tmp$diff, decreasing = TRUE),]
    } else {
    if (is.numeric(order)) {
    tmp <- tmp[order,]
    }
    if (is.character(order)) {
    rownames(tmp) <- names(average_yhats)
    tmp <- tmp[order,]
    }
    }

    The same for
    # how variables shall be ordered in the BD plot?
    if (is.null(order)) {
    # sort impacts and look for most importants elements
    tmp <- tmp[order(tmp$adiff_norm, decreasing = TRUE),]
    } else {
    if (is.numeric(order)) {
    tmp <- tmp[order,]
    }
    if (is.character(order)) {
    rownames(tmp) <- names(average_yhats)
    tmp <- tmp[order,]
    }
    }

- how about moving parameter interaction_preference after label? The order of parameters common with function local_attributions would overlap.

local_interactions.default <- function(x, data, predict_function = predict,
new_observation,
keep_distributions = FALSE,
interaction_preference = 1,
order = NULL,
label = class(x)[1], ...) {

local_attributions.default <- function(x, data, predict_function = predict,
new_observation,
keep_distributions = FALSE,
order = NULL,
label = class(x)[1], ...) {

  • codes that are repeated in local_attributions and local_interactions. It would be good to make code DRY.

    # here one can add model and data and new observation
    # just in case only some variables are specified
    # this will work only for data.frames
    if ("data.frame" %in% class(data)) {
    common_variables <- intersect(colnames(new_observation), colnames(data))
    new_observation <- new_observation[, common_variables, drop = FALSE]
    data <- data[,common_variables, drop = FALSE]
    }
    p <- ncol(data)

    # just in case only some variables are specified
    # this will work only for data.frames
    if ("data.frame" %in% class(data)) {
    common_variables <- intersect(colnames(new_observation), colnames(data))
    new_observation <- new_observation[, common_variables, drop = FALSE]
    data <- data[,common_variables, drop = FALSE]
    }
    p <- ncol(data)

  • unmeaningful variable name - tmp, maybe differences_1d or differences_summary?

    # impact summary for 1d variables
    tmp <- data.frame(diff = diffs_1d,
    adiff = abs(diffs_1d),
    diff_norm = diffs_1d,
    adiff_norm = abs(diffs_1d),
    ind1 = 1:p,
    ind2 = NA)
    rownames(tmp) <- gsub(rownames(tmp), pattern = ".yhats", replacement = "")

    The same for:
    tmp2 <- data.frame(diff = diffs_2d,
    adiff = abs(diffs_2d) * interaction_preference,
    diff_norm = diffs_2d_norm,
    adiff_norm = abs(diffs_2d_norm) * interaction_preference,
    ind1 = inds$ind1,
    ind2 = inds$ind2)
    tmp <- rbind(tmp, tmp2)

  • these codes differ a little, would it be possible to make them identical and make a separate function?

    iBreakDown/R/break_down.R

    Lines 168 to 182 in 1096fa4

    # extract values
    selected_values <- sapply(1:nrow(selected), function(i) {
    nice_pair(new_observation, selected$ind1[i], NA )
    })
    # prepare values
    variable_name <- c("intercept", colnames(current_data)[selected$ind1], "")
    variable_value <- c("1", selected_values, "")
    variable <- c("intercept",
    paste0(colnames(current_data)[selected$ind1], " = ", selected_values) ,
    "prediction")
    cummulative <- do.call(rbind, c(list(baseline_yhat), yhats_mean, list(target_yhat)))
    contribution <- rbind(0,apply(cummulative, 2, diff))
    contribution[1,] <- cummulative[1,]
    contribution[nrow(contribution),] <- cummulative[nrow(contribution),]

    # extract values
    selected_values <- sapply(1:nrow(selected), function(i) {
    nice_pair(new_observation, selected$ind1[i], selected$ind2[i] )
    })
    # prepare values
    variable_name <- c("intercept", rownames(selected), "")
    variable_value <- c("1", selected_values, "")
    variable <- c("intercept",
    paste(rownames(selected), "=", selected_values) ,
    "prediction")
    cummulative <- c(baseline_yhat, yhats_mean, target_yhat)
    contribution <- c(0, diff(cummulative))
    contribution[1] <- cummulative[1]
    contribution[length(contribution)] <- cummulative[length(contribution)]

  • How about adding a second class data.frame for break_down objects?

- names of parameters in plotD3() and plot() are inconsistent

iBreakDown/R/plotD3.R

Lines 55 to 59 in 1096fa4

plotD3.break_down <- function(x, ...,
max_features = 4,
min_max = NA,
vcolors = c("-1" = "#a3142f", "0" = "#a3142f", "1" = "#0f6333", "X" = "#0f6333"),
digits = 3, rounding_function = round) {

iBreakDown/R/plot.R

Lines 83 to 85 in 1096fa4

plot.break_down <- function(x, ..., add_contributions = TRUE, baseline = NA,
vcolors = c("-1" = "#f05a71", "0" = "#371ea3", "1" = "#8bdcbe", "X" = "#371ea3"),
digits = 3, rounding_function = round, plot_distributions = FALSE) {

Most of these issues get fixed.
Thanks.
Code is refactored and simplified, but the part for interactions is not a production quality code.

Once the methodology behind interactions will be established we need to rewrite local_interactions