In this final project, you are going to work on bag of little bootstraps algorithm.
There is a package at https://github.com/ucdavis-sta141c-sq-2020/blblm In this package, I have implemented the bag of little bootstraps for linear regression model.
You could install it with
devtools::install_github("ucdavis-sta141c-sq-2020/blblm")
and test it out.
Your job is to improved my package to various ways. For examples,
-
In the current implementation, only one CPU is used in the algorithm. Make it possible to use more than one CPUs. Note that you should let users to decide if they want to use parallelization.
-
Allow users to specify a list of file of datasets rather than loading the whole dataset in the main process then distribute to the workers. Each file would be then loaded in the workers to minimize memory usage.
-
Functions are written in pure R, it is possible, for example, to convert the function
lm1
to c++ code. Your might need look at how RcppArmadillo's fastLm.R and fastLm.cpp. (Spoiler, it is not easy, but if you insist, here is a some slides about it: https://scholar.princeton.edu/sites/default/files/q-aps/files/slides_day4_am.pdf) -
Write tests and documentations
-
More models? Logistic regression? GLM?
-
You should also write a few pages Rmarkdown documentation to explain your work. One recommendation way is to put the documentation as a vignette. (If you want to use
tidyverse
in the the vignettes, runusethis::use_package("tidyverse", type = "suggest")
to addtidyverse
in the suggest field of DESCRIPTION.)
The easiest way to start the project is to fork my package then use RStudio to clone from your personal repo.
However, your could also start a new package from scratch.
Your grade will be determined by the amount of work that you have made and how well they are implemented.
- (60%) the code:
- both correctness and efficiency
- code style: You want your code to be clean and well documented. Just imagine another people will be taking charge of the maintenance of your app. (Hint: make use of
styler
)
- (40%) miscellaneous
- tests
- documentations
- pass
devtools::check()
etc. - the vignette
Please visit this link to submit the github repo of your package.
You will need to login first using your github account.
Due: 6/10/2020 11:59pm