SpaceML

We are building a platform for citizen science in space called "SpaceML."

SpaceML originated with James Parr of NASA's Frontier Development Lab, after we worked together in the summer of 2018 (described in the video above). We tested the use of machine learning and cloud to find exoplanets, predict solar flares, and model atmospheres that could be produced by extraterrestial life.

Recently NASA has been asking for proposals to host a petabyte of space-related data, enabling citizens across the world to conduct their own experiments. The ESA has joined as well, now looking for methods to use space data to improve life on earth. For example, we could use satellite data from the upper atmosphere to detect shifts in earth's electromagnetic signature, which could be a sign of pending earthquakes as magma shifts beneath the surface.

Our initial version of SpaceML is a Jupyter Lab environment powered by a GPU-accelerated Kubernetes cluster. Data are hosted and shared on the public cloud (GCS and GBQ). Data are manipulated, explored and visualized within a Python environment of notebooks, terminal shells, and other Jupyter Lab plugins. We're inspired by Pangeo, a similar effort in earth science. Kubeflow is a generic model for machine learning on kubernetes; we borrow heavily from their designs, but believe every user needs their own cluster. :-)

Our stack is entirely open source. Dask expands in-memory models popularized by numpy and pandas to handle large data sets that exceed limits of a single machine (e.g. arrays with 100M rows). Predictive models are built and trained using tensorflow, scikit-learn, and pytorch. Hyperparamter optimization and distributed training are accelerated by Dask. Models are deployed into RESTful endpoints with seldon for integration with other systems.

We hope you find this stack useful in your own AI endeavors.

@scottpenberthy, January 2019

drscott173/spaceml

SpaceML