JGCRI/gcamdata

Cannot get driver_drake() with new regions even with correct IEA file

robbieorvis opened this issue · 27 comments

Hello, I followed the instructions to add a new region online (actually, I moved the UK out of the EU-15 and into its own region. I have a copy of the IEA file (which is included in the NCA submission online, FYI).

However, when I try to run driver_drake() to update data files, I keep running into an error on misalignment with the prebuilt version:

Error in verify_identical_prebuilt(L101.en_bal_EJ_R_Si_Fi_Yh_full, L101.en_bal_EJ_ctry_Si_Fi_Yh_full, :
(converted from warning) L101.en_bal_EJ_R_Si_Fi_Yh_full is not the same as its prebuilt version

The notes in the documentation say that when the model is correctly building from the IEA dataset, these warnings should not appear. Please advise.

pkyle commented

I'd recommend just re-generating the package data
source("data-raw/generate_package_data.R")
which will re-build the PREBUILT_DATA.rda, and then run it again. If that still doesn't work, I'd use driver() instead of driver_drake()

pkyle commented

Oh that makes sense--this is a problem in your settings in R, where warnings are converted to errors. This is a perfectly normal warning, expected in this case, letting you know that the pre-built data is different from the data you're generating from the source data. If you just run:
options(warn = 1)
In your R session, that should make this work fine, and I think it will remain as a global default option when you close R

pkyle commented

@kanishkan91 or @realxinzhao do you recall what the workaround is for re-building this forest file, with modified regions? I thought there might have been a constant that can be set that would rebuild it but I don't see that in there.

Thanks in advance @pkyle @kanishkan91 and @realxinzhao. Hoping to get this sorted out so we can get the regions adjusted.

@pkyle that fix never made it to core. The CMP that fixes this more holistically is still under review and yet to be merged. I can add a fix here if you think useful.

pkyle commented

Oh OK I see what happened. Robbie, the version of the core model that you're working from has temporarily disabled the capability to modify country-to-region mappings because of this forest file. The permanent fix is still under internal review so can't be shared. There is a temporary workaround that would allow one to re-generate that forest file after modifying country-to-region mappings. Kanishka, if there's a single commit (or two) that you can share with Robbie, you can just make a patch and post it here. Here are instructions for how to create and apply patches:

@pkyle Ok. I'l go through and make a patch.

A quick fix would be removing the following code in module_aglu_LB120.LC_GIS_R_LTgis_Yh_GLU
Lines 279-314

    # scale forest to avoid negative unmanaged forest area which caused issue for yield in Pakistan and African regions
    # L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust, pulled from L123.LC_bm2_R_MgdFor_Yh_GLU before managed forest scaling, was used here.
    L120.LC_bm2_R_LT_Yh_GLU %>%
      left_join(L120.LC_bm2_R_LT_Yh_GLU %>%
                  spread(Land_Type, value, fill = 0) %>%
                  left_join(L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust %>% select(-Land_Type),
			by = c("GCAM_region_ID", "GLU", "year")) %>%
                  mutate(nonForScaler =
                           if_else((Forest - MgdFor) < 0 & Forest > 0,
                                   1 + (Forest - MgdFor)/(Grassland + Shrubland + Pasture), 1),
                         ForScaler = if_else((Forest - MgdFor) < 0 & Forest > 0,  MgdFor/Forest ,1)) %>%
                  select(GCAM_region_ID, GLU, year, nonForScaler, ForScaler),
                by = c("GCAM_region_ID", "GLU", "year") ) %>%
      mutate(value = if_else(Land_Type %in% c("Grassland", "Shrubland" , "Pasture"),
                             value * nonForScaler,
                             if_else(Land_Type == "Forest", value * ForScaler, value) )) %>%
      select(-nonForScaler, -ForScaler) ->
      L120.LC_bm2_R_LT_Yh_GLU

However, forest yield in a few regions (e.g., Afirca_eastern, Pakistan, etc.) would be wrong without the adjustment (that affected the breakout). This might be okay if the study focus is not on land or forestry.
There could be a more complicated method to regenerate aglu/LDS/L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust.csv to fix forest yield. However, this will be fixed by @kanishkan91 in a better way in the next release.

Hi @robbieorvis There was a way we had developed a while back where you could regenerate the L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust.csv before re-running gcamdata. I'm going to try to add that in here. That would keep the yields consistent.

The next release is a bit far off unfortunately. So, would take a while to incorporate the holistic fix.

Hello @kanishkan91 just wanted to follow-up on this and see if you had been able to reproduce the solution you proposed.

@robbieorvis I implemented the solution locally and it seemed to have worked. Have not pushed the solution yet since I was busy with some deadlines. Will get this done by the weekend. My apologies!

@robbieorvis @pkyle Working on it now and will have pushed up by tonite. My apologies!

(PS- The past few weeks have been busy with the GCAM annual meeting coming up, hence the delay.)

@robbieorvis @pkyle - I have now created a branch with the fix for this issue. I also created a PR here so you can see the changes- #1240

As noted in the PR, you would need to follow the steps outlined below-

  1. When breaking out a region set the constant aglu.USE_BEFORE_ADJUST_FOREST_FILE to FALSE
  2. Now run up to chunk - module_aglu_LB123.LC_R_MgdPastFor_Yh_GLU. This can be done using driver(stop_after = "module_aglu_LB123.LC_R_MgdPastFor_Yh_GLU")
  3. This will generate the file- L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust in the gcamdata/outputs/ folder. This file will now have updated data for your new region.
  4. Copy the contents of this file (generated in step 3) to a file gcamdata/inst/extdata/aglu/LDS/L123.LC_bm2_R_MgdFor_Yh_GLU_beforeadjust . Just copy the contents and don't replace the whole file i.e. make sure the metadata does not change.
  5. Set constant aglu.USE_BEFORE_ADJUST_FOREST_FILE to TRUE
  6. After this , reload gcamdata- devtools::load_all(".")
  7. Now run driver or driver_drake as ususal

Let me know if you have further questions. As noted, the holistic fix will be a part of a future GCAM release. This is likely only a band aid.

@pkyle @realxinzhao - Let me know if I missed anything above. I also added you as reviewers to the PR in case you wanted to see the changes. Whether this should be merged into main or just be maintained as a branch is another question. This is at the moment as mentioned above only a work around.

Looks like this is working! Or at least, I was able to run the data system and port the files over and get GCAM to start running (quick aside: you have two sets of the same tags in the configuration.xml file for the nonCO2 emissions... probably should change the tags on the second set to denote they are the MAC files).

@pralitp @robbieorvis @pkyle Can I close this issue or will it be closed if / when the PR is merged in?

Hey all - are there any changes to the process for creating a new region with the release of GCAM v7?