/model

The Clay Foundation Model (in development)

Primary LanguagePythonApache License 2.0Apache-2.0

Clay Foundation Model

Jupyter Book Badge Deploy Book Status Continuous Integration Tests Status

An open source AI model and interface for Earth.

Getting started

Quickstart

Launch into a JupyterLab environment on

Binder Planetary Computer SageMaker Studio Lab
Binder Open on Planetary Computer Open in SageMaker Studio Lab

Installation

Basic

To help out with development, start by cloning this repo-url

git clone <repo-url>
cd model

Then we recommend using mamba to install the dependencies. A virtual environment will also be created with Python and JupyterLab installed.

mamba env create --file environment.yml

Note

The command above will only work for Linux devices with CUDA GPUs. For installation on macOS devices (either Intel or ARM chips), follow the 'Advanced' section in https://clay-foundation.github.io/model/installation.html#advanced

Activate the virtual environment first.

mamba activate claymodel

Finally, double-check that the libraries have been installed.

mamba list

Usage

Running jupyter lab

mamba activate claymodel
python -m ipykernel install --user --name claymodel  # to install virtual env properly
jupyter kernelspec list --json                       # see if kernel is installed
jupyter lab &

Running the model

The neural network model can be ran via LightningCLI v2. To check out the different options available, and look at the hyperparameter configurations, run:

python trainer.py --help
python trainer.py test --print_config

To quickly test the model on one batch in the validation set:

python trainer.py validate --trainer.fast_dev_run=True

To train the model for a hundred epochs:

python trainer.py fit --trainer.max_epochs=100

To generate embeddings from the pretrained model's encoder on 1024 images (stored as a GeoParquet file with spatiotemporal metadata):

python trainer.py predict --ckpt_path=checkpoints/last.ckpt \
                          --data.batch_size=1024 \
                          --data.data_dir=s3://clay-tiles-02 \
                          --trainer.limit_predict_batches=1

More options can be found using python trainer.py fit --help, or at the LightningCLI docs.