transportenergy/database

Mode & Vehicle Type values missing when running historical.process()

francescolovat opened this issue · 4 comments

Hi iTEM colleagues,

I'm raising this issue since recent usages of process.historical(0) (T000) and process.historical(4) (T004) led to NaNs in the resulting dataframe columns Mode & Vehicle Type (for the former) and Vehicle Type in the case of the latter.

Values seem to be present in the .csv files, so this might come from the processing scripts, unless I'm missing something.

pic-1
pic-2

It seems like a code issue as you suggested. To verify that, I am attaching the final spreadsheet that Humberto generated from his local codes, which does not have the same issue that you raised.
fig

The spreadsheet I showed here does not have "nan." They are shown as ALL. But I am not sure how to fix the online code, though.

@francescolovat good catch.

The best way to address this will be a consistency check, that gets run on the cleaning script for each of the upstream data sources.

Thanks @francescolovat for opening #72. This will be closed by #71, which adds the consistency checks I mentioned:

The best way to address this will be a consistency check, that gets run on the cleaning script for each of the upstream data sources.

…along with other improvements, namely avoiding the use of easily-confused column names (either via the older ColumnName enumeration or the newer column_name() function) and simply using everywhere the consistent IDs of the SDMX dimensions introduced by #62.
This has the advantage of associating the column labels to the SDMX concepts with the same IDs, and those can have long & verbose names and descriptions; longer than would be practical to stick in a column label.

Thank you, @khaeru!