
Re-download of RSR files

from glob import glob
from satpy import Scene
from pyresample.geometry import AreaDefinition
from pyresample.utils import proj4_str_to_dict
from pyproj import Proj
from satpy.utils import debug_on
import dask as da
from multiprocessing.pool import ThreadPool
from datetime import datetime

def EPSG_4326_definition(ll_ur=None, resolution=2.0):
    # ll_ur is (lon, lat, lon, lat)
    area_id = 'epsg4326'
    description = 'EPSG:4326'
    proj_id = 'epsg4326'
    #projection = '+proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m'
    projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
    proj4_dict = proj4_str_to_dict(projection)

    #resolution = 2.0
    y_size = 20480 / resolution  # Divide by two for 2km resolution
    x_size = 40960 / resolution  # ditto!

    ll_ur_ref = (-180.0, -90.0, 180.0, 90.0)
    area_extent = ll_ur

    y_size = int((ll_ur[3] - ll_ur[1]) / (ll_ur_ref[3] - ll_ur_ref[1]) * y_size)
    x_size = int((ll_ur[2] - ll_ur[0]) / (ll_ur_ref[2] - ll_ur_ref[0]) * x_size)
    print(y_size, x_size)

    return(AreaDefinition(area_id, description, proj_id, proj4_dict, x_size, y_size, area_extent))



FILES = glob("/data/kuehn/AHI_rt/DAT/two/*DAT")
scn = Scene(filenames=FILES, reader='ahi_hsd')


areadef = EPSG_4326_definition(ll_ur=(80.0, -10, 180.0, 30))

local_scene = scn.resample(areadef, cache_dir='/data/kuehn/AHI_rt/cache')
local_scene.save_dataset('true_color', filename='./local_true_color_crefl.tif', GDAL_OPTIONS=['COMPRESS=JPEG', 'PHOTOMETRIC=YCBCR', 'TILED=YES'])

Problem description
The rsr data is downloaded every time I run the script. The data is indeed downloaded and stored in a local directory with write permission. It should not be downloaded every time.
Note that I've modified /satpy/etc/enhancements/generic.yaml such that crefl_scaling is the default true_color enhancement.

Actual Result, Traceback if applicable

[INFO: 2018-10-10 21:29:45 : pyspectral.rayleigh] Atmosphere chosen: us-standard
[DEBUG: 2018-10-10 21:29:45 : pyspectral.rayleigh] LUT filename: /home/kuehn/.local/share/pyspectral/marine_clean_aerosol/rayleigh_lut_us-standard.h5
[DEBUG: 2018-10-10 21:29:45 : pyspectral.rsr_reader] Filename: /home/kuehn/.local/share/pyspectral/rsr_ahi_Himawari-8.h5
[WARNING: 2018-10-10 21:29:45 : pyspectral.rsr_reader] rsr data may not be up to date: /home/kuehn/.local/share/pyspectral/rsr_ahi_Himawari-8.h5
[INFO: 2018-10-10 21:29:45 : pyspectral.rsr_reader] Will download from internet...
[INFO: 2018-10-10 21:29:45 : pyspectral.utils] Download RSR files and store in directory /home/kuehn/.local/share/pyspectral
[DEBUG: 2018-10-10 21:29:45 : pyspectral.utils] Get data. URL:
[DEBUG: 2018-10-10 21:29:45 : pyspectral.utils] Destination = /home/kuehn/.local/share/pyspectral
[DEBUG: 2018-10-10 21:29:45 : urllib3.connectionpool] Starting new HTTPS connection (1):
[DEBUG: 2018-10-10 21:29:46 : urllib3.connectionpool] "GET /record/1409621/files/pyspectral_rsr_data.tgz HTTP/1.1" 200 2949478
2949478it [00:04, 611429.75it/s]
[DEBUG: 2018-10-10 21:29:53 : pyspectral.rsr_reader] Filename: /home/kuehn/.local/share/pyspectral/rsr_ahi_Himawari-8.h5

Versions of Python, package at hand and relevant dependencies
packages in environment at /home/kuehn/miniconda3/envs/satpy3:

Thank you for reporting an issue !

Thanks @ralphk11 I am looking into it, might be overlapping with #38
Will come back!

@ralphk11 Would be much helpful if you could actually make a minimal code example using pyspectral only that produce the same error.

@ralphk11 It is not a sustainable solution but you could add an environment pointing to a local customized pyspectral config file where you tell pyspectral to not go and try download. Like:

And in there you could put
download_from_internet: False

You can read about interacring with this config file on the pyspectral documentation.

It is of course so that if you have the latest version of the rsr files it shouldn't go on try download more than once, even if the above variable is True.

But, please provide a minimal use case and I will do my best!

I am on travel until tonight and may not have time look at it further before tomorrow

@ralphk11 I tried with the code below, and could not see any unwanted behaviour of multiple downloading. Maybe you can just try the below and repeat it and see if the behaviour is okay also on your side?

import numpy as np
from pyspectral.rayleigh import Rayleigh
from pyspectral.utils import debug_on

msi = Rayleigh('Himawari-8', 'ahi', aerosol_type='marine_clean_aerosol')

sunz = np.array([[32., 40.], [31., 41.]])
satz = np.array([[45., 20.], [46., 21.]])
ssadiff = np.array([[110, 170], [120, 180]])

refl_cor_red = msi.get_reflectance(sunz, satz, ssadiff, 'B02')

@ralphk11 What do you want me to do with this?

@adybbroe I tried your example with pyspectral only and it only downloaded the files once. My test code with ABI data does not to have the problem, however the example with AHI as shown above still downloads rsr_ahi_Himawari-8.h5 every time.

@ralphk11 Ok, many thanks for verifying. So, perhaps a satpy or a satpy/pyspectral problem. I will try see if I can get the same behaviour as you with AHI and satpy later this week.

@ralphk11 I am finally looking more in detail into this. I am able to run your example above.

I simplified it to focus only on the multiple download issue:

import os
from glob import glob
from satpy import Scene
from satpy.utils import debug_on
import dask as da
from multiprocessing.pool import ThreadPool

#DATADIR = "/data/kuehn/AHI_rt/DAT/two"
#CACHE_DIR = '/data/kuehn/AHI_rt/cache'
DATADIR = "/home/a000680/data/himawari8/201502070300"
CACHE_DIR = '/tmp'



FILES = glob(os.path.join(DATADIR, "*DAT"))
scn = Scene(filenames=FILES, reader='ahi_hsd')

And indeed, when removing my local Himawari RSR luts (and the version-indicator file PYSPECTRAL_RSR_VERSION) it downloads twice. However, the second time I run, no download is attempted. I was using conda on linux, using python 3, latest satpy and pyspectral from conda-forge. Could you try upgrade your environment and run my stripped down example code above, and tell me what the behaviour is on your side?

I have been able to fix the double download issue, which is in pyspectral. That fix is currently bein put in a PR.

Maybe @djhoese you could have a look as well?