sentinel-hub/sentinelhub-py

[FEAT] Sentinel Hub OAuth session transfer

AleksMat opened this issue · 2 comments

What is the problem?

In sentinelhub-py a session token for Sentinel Hub services is being cached, reused, and extended within a single Python runtime. Therefore the default multi-threaded download will only create a single token. A parallelization will a standard multiprocessing will create a number of tokens that is equal to the number of workers. However, when using advanced frameworks, such as Dask or Ray, and/or running download on a cluster of instances it can happen that the caching mechanism won't work anymore and that many more sessions will be created on each processing instance. In this case users should make sure to pass a token from one processing instance to all other.

Current solution

In the current version of sentinelhub-py a token can only be transferred together with an instance of a SentinelHubSession object by:

  • either creating a new token:

    import pickle
    from sentinelhub import SHConfig, SentinelHubSession
    
    config = SHConfig()
    session = SentinelHubSession(config=config)
    
    serialized_session = pickle.dumps(session)
  • or taking an existing token from a download client:

    import pickle
    from sentinelhub import SHConfig, SentinelHubDownloadClient
    
    config = SHConfig()
    client = SentinelHubDownloadClient(config=config)
    
    serialized_session = pickle.dumps(client.get_session())

Then this token is sent to another processing instance and:

  • either given to a download client:

    import pickle
    from sentinelhub import SentinelHubDownloadClient
    
    session = pickle.loads(transferred_session)
    client = SentinelHubDownloadClient(session=session)
  • or if the client object is not being used directly it can just be cached to its class attribute:

    import pickle
    from sentinelhub import SentinelHubDownloadClient, SHConfig
    
    session = pickle.loads(transferred_session)
    
    config = SHConfig()
    cache_key = config.sh_client_id, config.sh_client_secret, config.get_sh_oauth_url()
    SentinelHubDownloadClient._CACHED_SESSIONS[cache_key] = session

Future plans

In the future versions of sentinelhub-py we should:

  • improve the interface of this process by adding additional helper methods,
  • avoid using pickle serialization and instead serialize into a jsonifiable Python object
    • besides the raw token it would be good to propagate also client id for caching purposes,
  • prepare official guidelines of the token transfer process at the ReadTheDocs documentation page.

For now, I would plan this for 1.6.0 release.

Additional notes

This issue is very related to #146.

Hi @AleksMat, I believe that we are hitting the same issue described in #146. Are you aware of a workaround for that issue until this feature can be added?

EDIT: I apologize, I read the current solution section and thought that was the solution you were proposing, not the current working solution. I will give that a shot.

We have just released sentinelhub-py version 3.6.0 with a proper support for this. We also prepared an official tutorial notebook explaining the process and providing different implementations. Any feedback is welcome.