/U19-pipeline_python

The python data pipeline defined with DataJoint for U19 projects

Primary LanguageJupyter Notebook

U19-pipeline_python

The python data pipeline defined with DataJoint for U19 projects

The data pipeline is mainly ingested and maintained with the matlab repository: https://github.com/BrainCOGS/U19-pipeline-matlab

This repository is the mirrored table definitions for the tables in the matlab pipeline.

Installation

Prerequisites (for recommended conda installation)

  1. Install conda on your system: https://conda.io/projects/conda/en/latest/user-guide/install/index.html
  2. If running in Windows get git
  3. (Optional for ERDs) Install graphviz

Installation with conda

  1. Open a new terminal
  2. Clone this repository: git@github.com:BrainCOGS/U19-pipeline_python.git
    • If you cannot clone repositories with ssh, set keys
  3. Create a conda environment: conda create -n u19_datajoint_env python==3.7.
  4. Activate environment: conda activate u19_datajoint_env. (Activate environment each time you use the project)
  5. Change directory to this repository cd U19_pipeline_python.
  6. Install all required libraries pip install -e .
  7. Datajoint Configuration: jupyter notebook notebooks/00-datajoint-configuration.ipynb

Tutorials

We have created some tutorial notebooks to help you start working with datajoint

  1. Querying data (Strongly recommended)
  • jupyter notebook notebooks/tutorials/1-Explore U19 data pipeline with DataJoint.ipynb
  1. Building analysis pipeline (Recommended only if you are going to create new databases or tables for analysis)
  • jupyter notebook notebooks/tutorials/2-Analyze data with U19 pipeline and save results.ipynb
  • jupyter notebook notebooks/tutorials/3-Build a simple data pipeline.ipynb

Ephys element and imaging element require root paths for ephys and imaging data. Here are the notebooks showing how to set up the configurations properly.

Accessing data files on your system

There are several data files (behavior, imaging & electrophysiology) that are referenced in the database To access thse files you should mount PNI file server volumes on your system. There are three main file servers across PNI where data is stored (braininit, Bezos & u19_dj)

On windows systems

On OS X systems

On Linux systems

Notable data

Here are some shortcuts to common used data accross PNI

Sue Ann's Towers Task

Lucas Pinto's Widefield

Lucas Pinto's Opto inactivacion experiments

Get path info for the session behavioral file

  1. Mount needed file server
  2. Connect to the Database
  3. Create a structure with subject_fullname and session_date from the session
    key['subject_fullname'] = 'koay_K65'
    key['session_Date'] = '2018-02-05'
  4. Fetch filepath info: data_dir = (acquisition.SessionStarted & key).fetch('remote_path_behavior_file')

Major schemas

Currently, the main schemas in the data pipeline are as follows:

  • lab

Lab Diagram

  • reference

Reference Diagram

  • subject

Subject Diagram

  • action

Action Diagram

  • acquisition

Acquisition Diagram

  • task

Task Diagram

  • behavior

Behavior data for Towers task.

Behavior Diagram

  • ephys_element

Ephys related tables were created with DataJoint Element Array Ephys, processing ephys data aquired with SpikeGLX and pre-processed by Kilosort2. For this pipeline we are using the (acute) ephys module from element-array-ephys.

Ephys Diagram

  • imaging Imaging pipeline processed with customized algorithm for motion correction and CNMF for cell segmentation in matlab. Imaging Diagram

  • scan_element and imaging_element

Scan and imaging tables created with DataJoint Element Calcium Imaging, processing imaging data acquired with ScanImage and pre-processed by Suite2p.

Scan element and imaging element Diagram

Datajoint features

Import datajoint as follows:

import datajoint as dj

Update a table entry

dj.Table._update(schema.Table & key, 'column_name', 'new_data')

Get list of all column names in a table (without having to issue a query or fetch)

table.heading.attributes.keys()

This also works on a query object:

schema = dj.create_virtual_module("some_schema","some_schema")
query_object = schema.Sample() & 'sample_name ="test"'
query_object.heading.attributes.keys()