ReFRACtor/ABSCO

To do

Closed this issue · 11 comments

Items in ABSCO_tables.py that i have started working on but still need work and need to be addressed before the code delivery:

  1. species that have both line parameters and XS files (_HI and _XS in USS_AIRS_profile.csv, which is generated with standard_atm_profiles.py)
  2. User profile XS (records 3.7 and 3.7.1)
  3. H2O scaling with PWV
  4. Ensure consistency between P and profile CSV file
  5. Save pressure layer values and write them to the eventual netCDF file (in addition to the level values)
  6. with the TES XS ABSCO library (https://lex-gitlab.aer.com/RC/ABSCO_XS), we save the OD files, then do the post-processing (ABSCO calculation), but this is likely not necessary and consumes a lot of hard drive space, so i think we should not be doing this for the general ABSCO tables task. instead, we'll just save the ODs in memory in the processing (and we'll need to warn about RAM requirements)
  7. Verify that I guarantee consistency between the molecule names in standard_atm_profiles.py and ABSCO_*.py
  8. O2 dimer, BrO, heavy water special cases
  9. Include all pressures in same TAPE5 rather than doing a single layer at a time (like we did with the previous XS ABSCO software)
  10. Library should break up spectral regions into bands of 2000 cm-1.
  11. Work WV VMR into CO2 and N2 processing because of H2O effects on continuum
  12. Separate H2O VMR run in calcABSCO() then assemble ABSCO array with both runs
  13. Chunking in the output netCDF
  14. Verify valid data ranges (for output netCDF)
  15. wavelength to wavenumber conversion
  16. remove band/range dimension (read_ABSCO_tables.py, makeNC() in ABSCO_tables.py, netCDF templates?)
  17. RAM computation and warning
  18. continue updating README.md
  19. move ABSCO_tables.py main() function into its own driver script (run_lbl_absco.py)
  20. add O2 dimension
  21. notification if -e2e, -lbl, and -lnfl are not provided

there's not a lot i can do with 4. -- pfile and vmrfile have similar pressures but with different precisions, so it's hard to do any kind of equality check. right now i just check to see that the pressure arrays are the same size

for item 1, the only relevant molecules ("double agents", as i call them -- XS and line parameters exist for both) are NO2, SO2, CF4, and HNO3. i've added to the code what i think is required to handle these and ran a couple tests (NO2 for 1550-1600 cm-1 [line params] and 34690-34800 cm-1 [XS]; HNO3 for 200-600 cm-1 [XS] and 1550-1600 cm-1 [lines]). everything is working as i'd expect in these cases

it should be noted that NO2 and SO2 have identical density profiles regardless of whether XS or line parameters are used. that is not the case for HNO3 and CF4. this is handled in the code.

for item 7, i used this quick script:

#!/usr/bin/env python

allowed = ['H2O', 'CO2', 'O3', 'N2O', 'CO', 'CH4', 'O2', \
  'NO', 'SO2', 'NO2', 'NH3', 'HNO3', 'OCS', 'H2CO', 'N2', \
  'HCN', 'C2H2', 'HCOOH', 'C2H4', 'CH3OH', 'CCL4', 'CF4', \
  'F11', 'F12', 'F22', 'ISOP', 'PAN', 'HDO', 'BRO', 'O2-O2']

import pandas as pd

inCSV = '/home/pernak18/work/ABSCO/VMR/USS_AIRS_profile.csv'
csvDat = pd.read_csv(inCSV)
csvNames = csvDat.keys().values

for name in allowed:
  print(name, name in csvNames)
  if name not in csvNames:
    hiName = '%s_HI' % name
    xsName = '%s_XS' % name
    print('\t', hiName in csvNames, xsName in csvNames)

and the only guys that "fail" are F22, HDO, BRO, and O2-O2. the latter three i expect because they are special cases that i have to address eventually anyway, and F22 just has an alias (CHCLF2) that i need to use have addressed in the makeABSCO constructor.

for 3, see 9dee497

for 10, we should just have to include some provision in ABSCO_preprocess.py that breaks up the bands if necessary, then just proceed as we do when the bands are smaller than 2000 cm-1.

for 9, we decided to continue to do one layer (2 levels) at a time in each TAPE5, while providing the entire user profile for each. this has produced successful LBLRTM runs for all H2O pressures and temperatures (and note that with these code changes, we are processing a different number of temperatures per level)

see 2697366

for 6, i did a trial run with CO2 200-600 cm-1 and found that up to 14 GB of RAM were being utilized degrading by a factor of 4.

this can actually be improved upon -- there is no reason to store the wavenumber array associated with the OD array for each LBL run. for a given band, wavenumbers will always be the same.

for item 10, i tested the following configuration:

wn1 =  500 9000 100
wn2 =  600 800 200
res = 1e-4 1e-4 1e-4
degrade = 4 8 2

and everything looked good.

for 13, see:

https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters
https://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes
https://www.oreilly.com/library/view/python-and-hdf5/9781491944981/ch04.html
https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression
http://unidata.github.io/netcdf4-python/#section9

https://stackoverflow.com/questions/38860344/how-to-set-chunk-size-of-netcdf4-in-python
https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks
https://stackoverflow.com/questions/12067876/handling-very-large-netcdf-files-in-python

i'm pretty sure i'm doing the optimal thing right now. the python netCDF4 library does compress by default and also applies a default chunking size. if we want to manually change the chunking sizes, more experimentation is needed. xarray also has not added any efficiency or compressed the output file any smaller. i did this for ozone 500-600 cm-1, 1e-4 cm-1 resolution, and factor of 4 degradation and the file is 624M with compression and chunking. that might not seem like a lot, but remember the bandwidth is only 100 cm-1 and some molecules have another (H2O_VMR) dimension, so this file size can balloon.

for item 8, O2-O2 is already accounted for in the O2 continuum

for item 16, i now stack all of the spectra on top of each other and utilize an Extent_Indices array as James originally suggested. this works out well -- trying to do the band dimension that i was doing when there were bands of unequal size (either inconsistent ranges or resolutions) was a headache.

for item 17, RAM usage is another prompt at the beginning of the code. there may be too many prompts, but it will be easy to remove them if requested. the RAM usage assumes a full run (all pressures, temperatures, bands, and WV VMR) for a given molecule

for item 19, see 5027f51