Modularizing authentication
rwegener2 opened this issue · 1 comments
The Problem
The authentication method that exists right now is inherently tied to the Query
class because the .earthdata_login()
method is a method on that class. That has led to passing ._session
values from Query
objects to Variables
objects as needed.
As the cloud read effort develops, additional modules, such as Read
, will also need to authenticate. Because of this I would like to propose that we make authentication in the repository more modular, so it can be more easily extended to different classes.
Possible Solutions
Both of the ideas below create an EarthdataAuth
class. This class would have the existing earthdata_login()
functionality within it.
1 - Inheritance
We could create an class called EarthdataAuth
that holds the ._auth
, ._session
, and ._s3login_credentials
attributes that currently get created in Query
. Query
, Variables
, and other classes could then be subclasses of the EarthdataAuth
superclass.
Pros/Cons: After some reading it seems that isn't exactly the type of relationship we want to convey. The set of complexity added by a new level of inheritance feels disproportionate to the fairly simple need to keep track of 2-3 attributes and one function. The pro of this solution, however, is that I believe we could it up such that nothing about the user experience changes.
2 - Creating EarthdataAuth
as a class that other objects have as an attribute
Query
and other related objects could have an .auth
attribute that is an instance of EarthdataAuth
. EarthdataAuth
can get created either by calling the earthdata_login()
function, or automatically when a user needs authentication. The user (or the internal code) would access attributes like ipx.Query.auth._session
.
Pros/Cons: I think this solution could simplify our internal code and at least for me it is more intuitive than inheritance (that may just be me, though). The con of this solution is that it would change how the user authenticates, either by removing that step all together or by requiring they switch to ipx.Query.auth.earthdata_login()
.
Questions
- Any opinions about which of the two options to pursue? I am leaning toward the second one, but could be otherwise convinced. @JessicaS11
- Is there a reason that we prefer to ask users to run the
earthdata_login()
function themself? Are we interested in setting up the workflow such that the user is prompted for login credentials only when the execute a function that requires authentication? I think that could happen fairly easily on our end, but it would change the user experience.
Code
- this commit on the
move_auth
branch shows a wip example of approximately what the inheritance method could look like (method no. 1 above). I got that far and had some issues. I think the ordering of classes was causing different ._session objects to overwrite each other. That's when I starting thinking about if method no. 2 above would be simpler.
Thanks for this great writeup @rwegener2! Of the two solutions you propose, I would definitely lean towards the second one. We actually not too recently removed our own Earthdata
module, instead building out a similar set of auth methods directly in earthaccess (and also gaining the improved cloud and multi-DAAC auth already built in to that library). In that line of thinking, I wonder if - rather than create an internal EarthdataAuth class - we could simply leverage the earthaccess auth object directly, more or less as is done in Query.earthdata_login()
. Then the challenge turns back into your comment below.
Is there a reason that we prefer to ask users to run the earthdata_login() function themself?
The short answer is this is a legacy from the early days of programmatic data access. In many cases, particularly for tutorials, the primary mode of authentication was interactive (and there was no icepyx or earthaccess wrapping the manual sending of auth info with a request
and response
sequence, so users needed to add their credentials to the dict themselves). Thus, even once we added automatic auth options we wanted it to be clear to the user this was happening and didn't want to break existing workflows, so we did not make earthdata_login()
private. However, if the user calls Query.download()
and has not logged in, this will be called automatically, so they are not explicitly required to run it themselves.
Are we interested in setting up the workflow such that the user is prompted for login credentials only when the execute a function that requires authentication? I think that could happen fairly easily on our end, but it would change the user experience.
I think this is the way to go, and I think we could implement it without changing the user experience. For classes that might need auth, we can include it as part of the init as you showed in the Variables module in your linked commit. Rather than directly passing in a ._session
(as is done now), the auth object can be passed and we can use auth.get_session()
. This will also help us be better positioned to adopt more earthaccess functionality in the future (e.g. using it to interact with CMR, which we currently still do "manually" by sending requests).