comments, coding style, bioc review...
Closed this issue · 4 comments
points flagged with [bc] are Bioc guidelines that will show as warnings during BiocCheck
(= you shall not pass). points flagged with [!!] are just me, but I think they are really important. points flagged with [style] are hadley and/or bioc coding recommendations.
general comments:
- [bc] avoid using
1:n
and replace byseq_len(n)
, or for1:length(x)
useseq_along(x)
- [bc] avoid using
sapply
and usevapply
instead-sapply
will return a list or matrix, depending on whether output lengths match. If you expect a specific output usevapply
, otherwiselapply
. - could use
@importFrom package fun1 fun2 ...
rather than placing each in a separate line - [style] use spaces around logical operators (e.g.,
if (method == "x")
) - [bc & style & !!] keep lines below 80 characters.
- [bc & style & !!] do not use
F/T
butFALSE/TRUE
instead. - [!!] consider placing functions with separate documentation and a lot of code in separate scripts... makes it easier to manage, develop & read I'd say.
SampleQC_cells.R:
- could use
n_cores = length(K_list)
in the function definition rather thenif (missing(...)) ...
, to make visible to the user what it defaults to. - [!!] for arguments that should take a specific character string as value, use
arg = c("a", "b")
in the function definition, andarg <- match.arg(arg)
in the function. This 1) shows the choices in the usage of the function, 2) gives a very intuitive error message if violated. Also,arg %in% ...
does not assert the argument is of length 1, andif (method == ...)
will fail if it is not. - [!!] use
rowMins/Means/Anys/Maxs/AnyNAs...
overapply(x, 1, min/mean/...)
SampleQC_utils.R
make_qc_dt.SingleCellExperiment
- [bc] use
is(sce, "SingleCellExperiment")
instead ofclass
- I see
log10(total_feats)
but no +1; this will return non-finite values if there are 0s. - can use grep("^mt-", ignore.case = TRUE) rather than str_detect twice (or maybe that function also has a similar argument). I currently have data that has ^MT-, so ignoring the caps is the safest way to check all cases.
- it is uncommon to defined a generic
function(x)
and then have the methods use different inputsfunction(df, ...), function(sce, ...)
- consider using the same function definition for all methods. This would also require documentation each parameter once.
Done basically everything, with some caveats (recorded here mainly as a reminder for myself):
- For
log10(total_feats)
, I don't do+1
, but I do already haveif (any(total_counts == 0)) {stop(...)}
- I haven't split up the scripts into subscripts per function, although I may do at some point. Right now, I don't see much gain from it, as the worst offender (SampleQC_cells.R) is mainly internal functions anyway, so wouldn't actually reduce much in size.
Great!! Just a comment re 2- I agree mostly, I also keep 100s of lines of code in a script called e.g. plotting-utils.R or xxx-utils.R. But I have a personal preference to keep exported functions with documentation is a separate script. Mainly because i) just because I like it ;) and ii) when I browse somebody’s repo/package I get annoyed when I cannot find a function’s code. Surely, the repo can be searched with some fancy github command, but I like having exported, documented functions separate :)
Ah ok, I think I understand your point better now. So I could have one file with all the internal functions, and another with the exported ones. Will consider it, thanks.