
The python data pipeline defined with DataJoint for U19 projects

Primary LanguageJupyter Notebook


The python data pipeline defined with DataJoint for U19 projects

The data pipeline is mainly ingested and maintained with the matlab repository: https://github.com/shenshan/U19-pipeline-matlab

This repository is the mirrored table definitions for the tables.

Major schemas

Currently, the main schemas in the data pipeline are as follows:

  • lab

Lab Diagram

  • reference

Reference Diagram

  • subject

Subject Diagram

  • action

Action Diagram

  • acquisition

Acquisition Diagram

  • task

Task Diagram

  • behavior

Behavior Diagram

Installation of package for usage and development.

To use and contribute to the developement of the package, we recommend either using a Docker setup or creating a virtual environment, as follows:

  1. In either way, we first clone the directory git clone https://github.com/BrainCOGS/U19-pipeline_python

  2. To use a docker setup, after installing docker, inside this directory, we

  • set up the .env file, as follows:
DJ_HOST = 'datajoint00.pni.princeton.edu'
DJ_USER = {your_user_name}
DJ_PASSWORD = {your_password}
  • run docker-compose up -d
  • Then, we could run docker exec -it u19_pipeline_python_datajoint_1 /bin/bash This will provide you a mini environment to work with python.
  1. To use a virtual environment setup, we could
  • install virtualenv by pip3 install virtualenv
  • Create a virtual environment by 'virtualenv princeton_env'
  • Activate the virtual environment by source princeton_env/bin/activate
  • With the virtual environment, we could install the package that allows edits: pip3 install .

Undocumented datajoint features

For all code below, I am assuming datajoint has been imported like:

import datajoint as dj

Update a table entry

dj.Table._update(schema.Table & key, 'column_name', 'new_data')

Get list of all column names in a table (without having to issue a query or fetch)


This also works on a query object:

schema = dj.create_virtual_module("some_schema","some_schema")
query_object = schema.Sample() & 'sample_name ="test"'

The latter case is useful if you are passing the query object between functions or modules and you lose track of the table name.

Use boolean datatype

Example table:

class Experiment(dj.Manual):
    definition = """ # Experiments performed using the light sheet microscope
    experiment_id           :   smallint auto_increment    # allowed here are sql datatypes.
    cell_detection          :   boolean


It has some counterintuitive properties:

Inserted_value Stored_value
True 1
False 0
1 1
0 0
5 5*
-5 -5*
5000 DataError*
-5000 DataError*
'10' 10*
'-10' -10*
'0' 0*

*Would expect this to be stored as 1 based on the rules of bool in python. See: https://github.com/datajoint/datajoint-docs/issues/222