matloff/partools

about naming

Opened this issue · 4 comments

The points I raised here could be just personal taste, and it might be quite cumbersome to change names, but I think it's better discussed earlier than later.

I found some names in packages a little bit confusing:

  • ca as core of Software Alchemy. I expected sa for this since chunk averaging is seldom mentioned. For averaging, isn't it possible sometimes we need something different, like getting max 10 values from all data? That's a typical Hadoop example, but Software Alchemy can handle it as well.
    In the other hand, I find Software Alchemy itself didn't tell user what it is compare to Divide and Combine. Maybe you can also call it scatter compute.
  • all function names don't have any way to separate words, either camel cases or underscore.
    I have to mentally parse filesplitrand, the r inside it especially easy to be overlooked.
    calm is difficult to be read as ca lm.
    I'll suggest to use underscore, and use common prefix like stringr. So all functions will be like file_xx, dis_, sa_xx, or even just f_xx, d_xx.

I just found the vignettes already mentioned that sometimes you need more than averaging. This confirmed my idea that the ca name is not best. And I found scatter have some random shuffle meaning inherent so it's a good word for this case.

Agree that the names could be improved.

I'll suggest to use underscore, and use common prefix like stringr. So all functions will be like file_xx, dis_, sa_xx, or even just f_xx, d_xx.

To be clear, does this mean ca,cabase,calm,caglm,caprcomp become ca_, ca_base, ca_lm, ca_glm, ca_prcomp, etc.?

Yes, I didn't add the 'ca' example because I think ca is not the best representation of software alchemy. "Software alchemy" is not easy to understand or relate either.

Changing from 'ca' to 'sa' is a good idea. We can do that easily without breaking users' old partools code by simple assignments, e.g. salm <- calm.

I agree that the lack of separators like '_' may be difficult for a non-native speaker of English at first, but I would be reluctant to break users' existing code.

Software alchemy is really for means, including proportions, and is not appropriate for something like fetching the top 10 values of a variable. However, one can use partools in other ways. Actually, I was just the other day thinking about writing a convenience function for that.

As to Divide and Combine, see my 2016 JSS paper, which is referenced both in the man page and the vignette.