extractDateFromString() will take the last date in a file name with multiple dates
bairaelyn opened this issue · 0 comments
Only a problem when reading files from large archives without specified filename.
Example: DSCOVR data archive files have the format _DATASTARTTIME_DATAENDTIME_DATACOMPILATIONTIME, and look like this:
oe_m1m_dscovr_s20170911000000_e20170911235959_p20170912023324_pub.nc
When extractDateFromString() is searching the correct file for a date, it cycles through all available number strings in the filename but only takes the last one (here, the compilation date, which is not relevant for the data in the file):
for i in range(len(testunder)):
try:
numberstr = re.findall(r'\d+',testunder[i])[0]
except:
numberstr = '0'
if len(numberstr) > 4:
tmpdaystring = numberstr
elif len(numberstr) == 4 and int(numberstr) > 1900: # use year at the end of string
tmpdaystring = numberstr
if len(tmpdaystring) > 8:
try: # first try whether an easy pattern can be found e.g. test12014-11-22
match = re.search(r'\d{4}-\d{2}-\d{2}', daystring)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()
This could be remedied by testing all available len(numberstr) > 4
strings and returning the first deciphered, but that could break the automatic reading of other formats. Will need extensive testing. Workaround for now means reading data with endtime + 1 day, so that the correct files are read, then trimming down later.