VisionEval/VisionEval-Dev

D1B in Datastore output doesn't seem correct

Closed this issue · 11 comments

[Updated notes on the question: I misreported the D1B values from the datastore created before I corrected the Bzone unprotected area, the original values of which prepared by a colleague who might have been confused by the differences in the unit as well. The question is then how the unit of D1B is converted from person per acre to person per square mile.]

I expect D1B in Bzone is less than 100 persons per acre based on the model set-up, however, the output I read from Datastore shows me a very large discrepancy (e.g., the maximum is almost 40,000 times higher). I found that there are two units for this variable- one is persons per acre (Bzone) and the other is persons per square mile (Marea) -in the codes. Even if the unit is persons per square mile, it is still 60 times higher after I coverted the output to persons per acre. I couldn't find the bug in the codes, but my guess is that there is something wrong with the exporting output part. Can anyone help with this issue? Thank you!

Interesting --- I see that the units are persons/acre at the Bzone level in VELandUse, see this line of `Calculate4DMeasures.R. All the other documentation for this module refers to persons/acre (see wiki here).

In the CalcMetroMeasuresFunction.R, the units at the Marea level are indeed persons/square mile, see here.

What scripts are you using to query the datastore? What is the range of values you are seeing in the datastore?

The summarizeDatasets function in the framework performs Unit conversions based on the query specification. So what CalcMetroMeasuresFunction.R script is being told to convert whatever is in the Database (Persons/Acre) to the desired units (Persons/Mile). The other query script I provided to Central Lane also does unit conversion.

I am updating the extract facility to allow optional unit conversions as well. In a forthcoming update, it will be much easier to examine the units and to change them to something more useful when the data is extracted. That will also address Issue #122 related to the speed units.

Perhaps Dongmei can send me the latest input files and I can look into the code to investigate why we're still getting such a large discrepancy...

I read D1B from the Datastore folder (i.e., intermediate output?). The range of values is shown below:

image

You can also view the spatial pattern here.

The input files you have for CLMPO shouldn't produce very different results from mine, but I will send you again the files. I reported twice this issue to Jeremy and Tara via email (just to note that this is the same issue if you missed my previous emails).

This looks like probably an issue with the input files -- because you are right, the values in D1B.Rda far too large. So it is not an issue with the unit conversion in VEReports. Thank you for sending the input files, we will take a look.

Some of your input Bzones have very small areas -- in bzone_unprotected_area.csv, there are some bzones of less than an acre. I expect that this is causing the high densities.
Portland_Bzone_unprotected

Sorry, I have corrected that file and I didn't send you the most recent version. The D1B is based on population and total area (in this case, only urban area). I have checked the Pop output (in Datastore), and the maximum value is 6999 (in Bzone). The results of population density should not be that high even if the urban area for the maximum population is 0.5. The model would remind me if population density exceeds 100 persons per acre. Before correcting Bzone unprotected area protection, there are some Bzones have higher-than-hundred values and the warning disappears after correction. This is why it puzzles me. The calculation process is all correct, but the exported data looks very strange. I am sending the corrected Bzone data and you may compare the D1B in Datastore. Thank you!

I think the input bzone unprotected area unit is acres, and result D1B is in square miles by default. I guess your inputs are in square miles. For reference, the largest D1B I got from the run for Atlanta is 36770. It is a census tract located in Atlanta Midtown with 6689 population and 0.18 square miles(116.42 acres) area.

My input is in acres, please ignore the above table (which might use square miles). Apologize that my codes above with "CLMPO-Staged" is not the right path to read my most recent data either. I think you are right about the output unit, but I don't know where the unit conversion is done.

@shichenfan very helpful, thank you for that example from Atlanta. I think I have found the answer. The conversion to square miles happens all the way at the beginning, where initializeModel() reads in the inputs and does appropriate unit conversion, using the defs/units.csv as a guide. For the area values, when areas are provided in units other than what is specified in units.csv, conversion happens there. In this case @dongmeic, your units.csv specifies area as SQMI, so all areas in the datastore are converted to sqmi.

Try this: Make a clean copy of your VERSPM directory, with no Datastore or ModelState.RData. Then in RStudio, set your working directory to that VERSPM directory and run the following steps from the beginning of run_model.R:

#===========
#run_model.R
#===========

#This script demonstrates the VisionEval framework for the RSPM model.
cat('run_model.R: script entered\n')
#Load libraries
#--------------
library(visioneval)
cat('run_model.R: library visioneval loaded\n')

planType <- 'callr'

#Initialize model
#----------------
initializeModel(
  ModelScriptFile = "run_model.R",
  ParamDir = "defs",
  RunParamFile = "run_parameters.json",
  GeoFile = "geo.csv",
  ModelParamFile = "model_parameters.json",
  LoadDatastore = FALSE,
  DatastoreName = NULL,
  SaveDatastore = TRUE
  )  
cat('run_model.R: initializeModel completed\n') 

Now, look at the units of the area. Do this as follows:

load(Datastore/2010/Bzone/UrbanArea.Rda)
attr(Dataset, 'UNITS')

You will see the units are in SQMI. So now whenever calculations on area are conducted, the area values will be in square miles.

If you want to keep everything in acres, change the defs/units.csv file so that the entry for area is ACRE. Then repeat the steps above, and you will see that the units are in acres still.

@shichenfan @jrawbits @dflynn-volpe Thanks all for the helpful notes! I will read more about model initiation and test. I prefer SQMI in this case because a large proportion of D1B will be too small and even zeros by using acres.

Closing this issue -- we tracked down where unit conversion was happening, and the unusually high density (D1B) appears to be driven by large group quarters dwelling units for one particular Bzone.