chris-prener/censusxy

To Do List

bransonf opened this issue · 10 comments

  • Finish Tests
  • Write Vignette
  • Update README
  • Pass CRAN Checks
  • Implement CI
  • Build Pkgdown Site
  • create landing page for pkgdown site as index.Rmd - use other openGIS pkgs as template
  • the .rda files are not properly serialized for pre r v 3.5 support - https://cran.r-project.org/doc/manuals/r-devel/NEWS.html under r 3.6.0 news
  • remove use of tidy::separate_() - the _ functions are no longer recommended
  • description of .data for cxy_geocode() is confusing
  • create two exported sample data sets - stl_homicide.rda and stl_homicide_small.rda - with associated documentation
  • build out unit tests using stl_homicide_small.rda to decrease testing time
  • Improved join by UID to ID
  • send to winbuilder
  • initial release with Zenodo
  • fork and add CRAN instructions to README.Rmd, index.Rmd, and censusxy.Rmd
  • rebuild pkgdown site and README on the fork
  • submit to CRAN

Nice! Progress looks great so far!

@chris-prener Could you implement the continuous integration for me? I would rather not create travis/appveyor accounts and not sure of best practice implementation.

yep - I'll do that now

OK @bransonf - I cleaned up some code and some of the package infrastructure and added a bunch of items to the to-do list. Nothing major, I don't think! Nice work on getting these API calls sorted out.

A couple questions for cxy_geocode:

  • why is this named 'response'? is there a reason not to overwrite 'split'?
  • the left_join's performance could be improved if we used one variable instead of 4... can we use id?

A couple questions for cxy_geocoder:

  • what is the practical impact of using a warning in an internal function... I'm not sure about this
  • why call methods::as() instead of as.numeric()?

Also, FWIW, appyveyor is failing because the new dplyr release's binaries aren't out yet and for some reason the source builds are failing. On the Travis side, it is failing on 3.3 and 3.4 because of the serialization issue I noted above.

What do we do about the appveyor issue?

Reason to not overwrite split, as I understand (Could be wrong about this) is that initializing an empty vector pre-allocates memory for data to be put into. If the function re-wrote to the split object, it would put it in a new section of memory, and it would have to call the new object every iteration in the loop. (Causing an exponential slow down) Even if I'm wrong about that, the worst risk of using another object is using a bit more memory.

I'm not really happy about the left_join part at all. I never found a solution for unique id to row id. I would like to create a list of only unique addresses to send to the geocoder and then join that UID to a regular ID.

Dependency on methods and internal warning are remnants of early development, they've been removed. The error for all Non-Matches takes care of what the internal warning was meant for.

And a note about the testing, we won't technically get full coverage without using more than 1000 addresses, but I'm confident that if something would fail in the batch process, it would also fail on the smaller data.

OK - totally fine with split and glad you made those other changes. I made a couple of small tweaks to the unit test and fixed a documentation issued that R CMD check identified.

We're still failing one of the unit tests, which is connected to what happens when the data have a state column but it isn't included in the function call. This is something we need to address. Can you check that out?

That same solution is implemented in parts of both gateway and postmaster for the UID... do you want me to add that or do you want to explore how it works in those packages?

RE: appveyor and Travis failing - the dplyr binaries should be available early this week and that should resolve any outstanding issues. As long as devtools::check() is doing well locally, I'm not concerned.

OK - almost ready to fork and submit to CRAN - we just need to let the stack of CI builds wrap up!