/eucp

European Climate Prediction system - notebook server setup

Primary LanguagePythonApache License 2.0Apache-2.0

EUCP Notebook

A notebook and data sharing system for climate research.

Goals

This repository contains and documents the infrastructure for a notebook and data sharing system for use within the European Climate Prediction (EUCP) project, a Horizon 2020 project.

The notebook system allows scientists to use a uniform interface to perform their analysis on climate data sets. It runs on the same server where the data is stored, or very close to a data server, removing the need to download the required data sets locally. Data access should also be easy, and fast. The use of notebooks also allows easier sharing of analyses, for example with peers (direct collaborators, referees etc.) and a more general audience. In addition, the analysis steps are to be saved (including history versioning) for reproducibility.

Setup

A more detailed setup guide provided in the setup document.

For an introduction and overview of the architecture, see the docs directory.

An introduction to Jupyter can be found in the tutorial section, while examples notebooks are available in the examples directory.

The details of the setup can be found in the Ansible directory. The Ansible playbook is used to configure the server. The details of the Docker image can be found in the Docker directory.

Resources

The server and storage is hosted at the SURFsara (NLeSC partner) data center. The total amount of data storage available is 100 TB. This includes the need for scratch space for individual users during analysis runs.

The current setup is a Ubuntu virtual machine (VM), with four cores.

The storage uses CEPH datablocks. These will be made persistent, so that scratch space will persist between restarts of the VM. The home directories of the users are also persistent, on a separate (small) CEPH disk. This allows users to store notebooks and small, intermediate files, in their home directories.

Some details about the cloud system used can be found at SURFsara HPC cloud documentation.

Related projects

  • Pangeo data

    Requires a GitHub account to sign in. Provides a JupyterLab environment for geoscience. It allows to run multi-core processes using dask and xarray on a cloud system: this will fire off separate "pods" for the computation.

    This appears to be one of the few (only?) similar project that has its source code fully available, with an installation description.

    For more information, see http://pangeo.io.

  • WEkEO

    WEkEO appears to target a very broad audience, and many possibilities (including virtual machines and notebooks). It is currently in testing mode, with a possibility for free trial. It is unclear which data is stored directly next to the analyses server or virtual machines.

  • ECASLab

    Looks to be very similar in purpose to this project: a Jupyter notebook/lab environment next to datasets Uses Ophidia as a terminal interface, instead of the standard terminal.

  • Copernicus Climate Data Store

    Has its own version of a notebook, which appears less flexible and extendable compared to a standard JupyterHub + Lab environment.

Copyright

This project is copyright 2019 Netherlands eScience Center

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.