U19-pipeline_python

The python data pipeline defined with DataJoint for U19 projects

The data pipeline is mainly ingested and maintained with the matlab repository: https://github.com/BrainCOGS/U19-pipeline-matlab

This repository is the mirrored table definitions for the tables in the matlab pipeline.

Installation

Prerequisites (for recommended conda installation)

Install conda on your system: https://conda.io/projects/conda/en/latest/user-guide/install/index.html
If running in Windows get git
(Optional for ERDs) Install graphviz

Installation with conda

Open a new terminal
Clone this repository: git@github.com:BrainCOGS/U19-pipeline_python.git
- If you cannot clone repositories with ssh, set keys
Create a conda environment: conda create -n u19_datajoint_env python==3.7.
Activate environment: conda activate u19_datajoint_env. (Activate environment each time you use the project)
Change directory to this repository cd U19_pipeline_python.
Install all required libraries pip install -e .
Datajoint Configuration: jupyter notebook notebooks/00-datajoint-configuration.ipynb

Tutorials

We have created some tutorial notebooks to help you start working with datajoint

Querying data (Strongly recommended)

jupyter notebook notebooks/tutorials/1-Explore U19 data pipeline with DataJoint.ipynb

Building analysis pipeline (Recommended only if you are going to create new databases or tables for analysis)

jupyter notebook notebooks/tutorials/2-Analyze data with U19 pipeline and save results.ipynb
jupyter notebook notebooks/tutorials/3-Build a simple data pipeline.ipynb

Ephys element and imaging element require root paths for ephys and imaging data. Here are the notebooks showing how to set up the configurations properly.

Accessing data files on your system

There are several data files (behavior, imaging & electrophysiology) that are referenced in the database To access thse files you should mount PNI file server volumes on your system. There are three main file servers across PNI where data is stored (braininit, Bezos & u19_dj)

On windows systems

From Windows Explorer, select "Map Network Drive" and enter:
\\cup.pni.princeton.edu\braininit\ (for braininit)
\\cup.pni.princeton.edu\Bezos-center\ (for Bezos)
\\cup.pni.princeton.edu\u19_dj\ (for u19_dj)
Authenticate with your NetID and PU password (NOT your PNI password, which may be different). When prompted for your username, enter PRINCETON\netid (note that PRINCETON can be upper or lower case) where netid is your PU NetID.

On OS X systems

Select "Go->Connect to Server..." from Finder and enter:
smb://cup.pni.princeton.edu/braininit/ (for braininit)
smb://cup.pni.princeton.edu/Bezos-center/ (for Bezos)
smb://cup.pni.princeton.edu/u19_dj/ (for u19_dj)
Authenticate with your NetID and PU password (NOT your PNI password, which may be different).

On Linux systems

Follow extra steps depicted in this link: https://npcdocs.princeton.edu/index.php/Mounting_the_PNI_file_server_on_your_desktop

Notable data

Here are some shortcuts to common used data accross PNI

Sue Ann's Towers Task

Lucas Pinto's Widefield

Lucas Pinto's Opto inactivacion experiments

Get path info for the session behavioral file

Mount needed file server
Connect to the Database
Create a structure with subject_fullname and session_date from the session
key['subject_fullname'] = 'koay_K65'
key['session_Date'] = '2018-02-05'
Fetch filepath info: data_dir = (acquisition.SessionStarted & key).fetch('remote_path_behavior_file')

Major schemas

Currently, the main schemas in the data pipeline are as follows:

reference

subject

action

acquisition

task

behavior

Behavior data for Towers task.

ephys_element

Ephys related tables were created with DataJoint Element Array Ephys, processing ephys data aquired with SpikeGLX and pre-processed by Kilosort2. For this pipeline we are using the (acute) ephys module from element-array-ephys.

imaging Imaging pipeline processed with customized algorithm for motion correction and CNMF for cell segmentation in matlab.
scan_element and imaging_element

Scan and imaging tables created with DataJoint Element Calcium Imaging, processing imaging data acquired with ScanImage and pre-processed by Suite2p.

Datajoint features

Import datajoint as follows:

import datajoint as dj

Update a table entry

dj.Table._update(schema.Table & key, 'column_name', 'new_data')

Get list of all column names in a table (without having to issue a query or fetch)

table.heading.attributes.keys()

This also works on a query object:

schema = dj.create_virtual_module("some_schema","some_schema")
query_object = schema.Sample() & 'sample_name ="test"'
query_object.heading.attributes.keys()

GeetikaSi/U19-pipeline_python