wmacnair/SampleQC

comments, coding style, bioc review...

Closed this issue · 4 comments

points flagged with [bc] are Bioc guidelines that will show as warnings during BiocCheck (= you shall not pass). points flagged with [!!] are just me, but I think they are really important. points flagged with [style] are hadley and/or bioc coding recommendations.

general comments:

  • [bc] avoid using 1:n and replace by seq_len(n), or for 1:length(x) use seq_along(x)
  • [bc] avoid using sapply and use vapply instead- sapply will return a list or matrix, depending on whether output lengths match. If you expect a specific output use vapply, otherwise lapply.
  • could use @importFrom package fun1 fun2 ... rather than placing each in a separate line
  • [style] use spaces around logical operators (e.g., if (method == "x"))
  • [bc & style & !!] keep lines below 80 characters.
  • [bc & style & !!] do not use F/T but FALSE/TRUE instead.
  • [!!] consider placing functions with separate documentation and a lot of code in separate scripts... makes it easier to manage, develop & read I'd say.

SampleQC_cells.R:

  • could use n_cores = length(K_list) in the function definition rather then if (missing(...)) ..., to make visible to the user what it defaults to.
  • [!!] for arguments that should take a specific character string as value, use arg = c("a", "b") in the function definition, and arg <- match.arg(arg) in the function. This 1) shows the choices in the usage of the function, 2) gives a very intuitive error message if violated. Also, arg %in% ... does not assert the argument is of length 1, and if (method == ...) will fail if it is not.
  • [!!] use rowMins/Means/Anys/Maxs/AnyNAs... over apply(x, 1, min/mean/...)

SampleQC_utils.R

make_qc_dt.SingleCellExperiment

  • [bc] use is(sce, "SingleCellExperiment") instead of class
  • I see log10(total_feats) but no +1; this will return non-finite values if there are 0s.
  • can use grep("^mt-", ignore.case = TRUE) rather than str_detect twice (or maybe that function also has a similar argument). I currently have data that has ^MT-, so ignoring the caps is the safest way to check all cases.
  • it is uncommon to defined a generic function(x) and then have the methods use different inputs function(df, ...), function(sce, ...) - consider using the same function definition for all methods. This would also require documentation each parameter once.

Done basically everything, with some caveats (recorded here mainly as a reminder for myself):

  • For log10(total_feats), I don't do +1, but I do already have if (any(total_counts == 0)) {stop(...)}
  • I haven't split up the scripts into subscripts per function, although I may do at some point. Right now, I don't see much gain from it, as the worst offender (SampleQC_cells.R) is mainly internal functions anyway, so wouldn't actually reduce much in size.

Great!! Just a comment re 2- I agree mostly, I also keep 100s of lines of code in a script called e.g. plotting-utils.R or xxx-utils.R. But I have a personal preference to keep exported functions with documentation is a separate script. Mainly because i) just because I like it ;) and ii) when I browse somebody’s repo/package I get annoyed when I cannot find a function’s code. Surely, the repo can be searched with some fancy github command, but I like having exported, documented functions separate :)

Ah ok, I think I understand your point better now. So I could have one file with all the internal functions, and another with the exported ones. Will consider it, thanks.