This application is the backend server for the PhotonRanch Datalab. It is a django application with a REST API for communicating with the Datalab UI.
- Python >= 3.9
- Django >= 4
Start by creating a virtualenv for this project and entering it:
python -m venv /path/to/my/virtualenv
source /path/to/my/virtualenv/bin/activate
Then install the dependencies:
pip install -e .
The project is configured to use a local sqlite database. You can change that to a postgres one if you want but sqlite is easy for development. Run the migrations to setup the database.
./manage.py migrate
Get your auth token from the UI by signing in with your LCO credentials and checking your cookies for an auth-token. Once you have it export it to your dev enviorment like
export ARCHIVE_API_TOKEN=<your-auth-token>
Start up a Redis Server that will faciliate caching as well as the rabbitmq queue. To do this make sure you have Redis installed and then start a server at port 6379
redis-server
Start the dramatiq worker threads, here we use a minimal number of processes and threads for size but feel free to run a full dramatiq setup as well.
./manage.py rundramatiq --processes 1 --threads 2
Now start your server
./manage.py runserver
The application has a REST API with the following endpoints you can use. You must pass your user's API token in the request header to access any of the endpoints - the headers looks like {'Authorization': 'Token 123456789abcdefg'}
if you are using python's requests library.
Datasessions can take an input_data
parameter, which should contain a list of data objects. The current format is described below, but this is probably something that will evolve as we learn more how we are using it.
session_input_data = [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
},
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010333'
},
]
Data operations can have a varying set of named keys within their input_data
that is specific to each operation. For example it would look like this for an operation that just expects a list of files and a threshold value:
operation_input_data = {
'input_files': [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
}
],
'threshold': 255.0
}
POST /api/datasessions/
post_data = {
'name': 'My New Session Name',
'input_data': session_input_data
}
GET /api/datasessions/
GET /api/datasessions/datasession_id/
DELETE /api/datasessions/datasession_id/
Available Operations are introspected from the data_operations
directory and must implement the BaseDataOperation
class. I expect we will add more flesh to those classes when we actually start using them.
GET /api/datasessions/datasession_id/operations/
POST /api/datasessions/datasession_id/operations/
post_data = {
'name': 'Median', # This must match the exact name of an operation
'input_data': operation_input_data
}
DELETE /api/datasessions/datasession_id/operations/operation_id/
- Come up with operation
wizard_description
format and add endpoint to get them for all available operations so the frontend can auto-create UI wizards for new operations. - Figure out user accounts between PTR and datalab - datalab needs user accounts for permissions to gate access to only your own sessions.
- Implement operations to actually do something when they are added to a session
- Figure out caching and storage of intermediate results
- Figure out asynchronous task queue or temporal for executing operations
- Add in operation results/status to the serialized operations output (maybe to the model too as needed)