ohbm/hackathon2022

DataCat: "bring your own data" and generate user-friendly data catalogs

Opened this issue · 5 comments

Title

DataCat: "bring your own data" and auto-generate user-friendly data catalogs

Short description and the goals for the OHBM BrainHack

Summary

Do you want to learn how to generate a pretty and F.A.I.R. browser-based data catalog from metadata? Do you want to know how you can make your data known to the world, without sharing the actual data content on centralised infrastructure? Do you want to do this for free using open-source tools? YES?! Then "bring" your own data and join our hackathon project!

Overview

DataLad Catalog is a free and open source command line tool, with a Python API, that assists with the automatic generation of user-friendly, browser-based data catalogs from structured metadata. It is an extension to DataLad, and together with DataLad Metalad it brings distributed metadata handling, catalog generation, and maintenance into the hands of users. For a live example of a catalog that was generated using DataLad Catalog, see our StudyForrest Demo. The tool is now ready to be tested (and hopefully broken and then fixed!) on a wider range of user data. This is therefore intended to be a "bring your own data" project. If you are interested in metadata handling of (distributed) datasets, and specifically in generating a live catalog from said metadata, join us for a chance to turn your (metadata)data into a pretty browser application!

Project Goals

  • Getting participants up to speed on what DataLad Catalog is and what it can do. This will be done through an initial discussion and by reading the primer
  • Giving participants hand-on experience with the catalog generation process, with the use of walk-through tutorials
  • Creating your own data catalogs
  • Documenting feedback on your experience by creating issues (any and all types of issues are welcome!)
  • Onboarding anyone interested in contributing to this tool in the many ways that are possible

Link to the Project

https://github.com/datalad/datalad-catalog

Image for the OHBM brainhack website

https://raw.githubusercontent.com/jsheunis/ohbm-2022/main/pics/datacat0_hero.svg

Project lead

Stephan Heunis

  • GitHub: jsheunis
  • Discord: Stephan#8144
  • Twitter: fmrwhy

Main Hub

Glasgow

Other Hub covered by the leaders

  • Glasgow
  • Asia / Pacific
  • Europe / Middle East / Africa
  • Americas

Skills

We welcome all kinds of contributions from various skills at any level. From setting up and writing documentation, discussing relevant functionality, or user-experience-testing, to Python-based implementation of the desired functionality and creating real-world use cases and workflows.

You can help us with any of the following skills:

  • You have a dataset (or distributed datasets) for which you'd like to create an online catalog
  • You enjoy breaking user interfaces or pointing out how the interface can be more intuitive
  • You have experience with the Unix command line
  • You are interested in creating accessible documentation
  • You know Python / JavaScript / HTML / VueJS
  • You are interested in learning about the DataLad ecosystem or the process of creating a DataLad extension
  • You are interested in learning about the DataLad metadata handling capabilities and/or the process of creating a DataLad-based metadata extractors
  • You have knowledge of metadata standards in your domain
  • You have knowledge of BIDS and pybids (for the specific case of generating BIDS-related metadata, and rendering that in the catalog)

Recommended tutorials for new contributors

Good first issues

We will try to generate a constant flow of good-first-issues throughout the project. Some examples are:

  • Set up all-contributors to acknowledge project contributions
  • Create a zenodo.json or CITATION.cff file to make the project citable
  • Set up a welcome bot such as this one to automatically comment on issues or PRs
  • Extend read-the-docs based documentation

Twitter summary

Do you want to publish your data openly, without sharing actual content on centralised infrastructure? Want to auto-generate a browser-based data catalog from metadata? YES?! Then "bring" your own data and join our project: DataCat! https://github.com/datalad/datalad-catalog

Short name for the Discord chat channel (~15 chars)

datacat

Please read and follow the OHBM Code of Conduct

  • I agree to follow the OHBM Code of Conduct during the hackathon

HI @jsheunis, great project :) Have you considered running the project in the cloud on a vm or in a jupyterbook? Happy to help you to set something up :)

hey @likeajumprope, thanks! I've already created an environment on Binder, which project members will use when they do the tutorial associated with the project. Other than that I haven't given a cloud environment much thought.

I might join this project with my/your own data at https://github.com/ReproNim/ReproTube/

Thank you for submitting the project! We have 35 projects right now, woohoo! But that means the projects pitches will have to be short. We will give you tomorrow 2 minutes to pitch your project, you can have one slide or no slides!
If you decide to use a slide, please include the link to the slide here.

And don't worry, you will still have more time to talk about your project during the BrainHack :-)