lter/lterwg-som

NutNet join script duplicated chemistry values across years

Closed this issue · 6 comments

Need to fix NutNet join script to properly carry over chemistry (e.g. lyr_soc) for each sample year. Right now, the values for all chem analytes (columns) are duplicated across years.

@srearl @wwieder The raw provided data files I've found in the zip files for NutNet (e.g "comb-by-plot-clim-soil-diversity-02-Aug-2019.csv") has the duplicated soil analyte data across years by plot, suggesting this is likely what we received from the data providers and not a subsequent script error. Metadata for the csv file says that the "perC" column is "pre-treatment soil % Carbon by mass."

@wwieder Perhaps...

  1. ...this is an error in the combine script used by the data providers?
  2. ...the post-treatment percent C data simply does not exist?
  3. ...post-trt soil data has been intentionally left out?

For the moment, the idea to fix this is to add on to the join script a section that removes the repeated soil data columns if year != 0. Script this at the tarball level. @piersond

Update: need to rehomog the NutNet folder. Keykey had improper column name for treatment year, I've fixed this in keykey. Ready for rehomog, will try to do this tomorrow or Friday. For after homog, I have written a rough draft of script to clean the duplicate NutNet data from the tarball. Saved in "data processing" > "keyV2 scripts" > "fixes"

Yep, rehomog done and NutNet cleaner script written. Stevan's planning to get the new tarball together early this week.

tarball 2019-10-27 posted. @piersond - double-check my results, but I think that I was able to replicate your for-loop with: mutate_at(.vars = NN_columns_to_clean, .funs = ~replace(., grepl("nutnet", network, ignore.case = T) & observation_date >= tx_start, NA)); note that you had capitalized the L of fe_HCl, which I did not correct in your script on the repo.