/wirehead

Caching system for synthetic fMRI data using MongoDB

Primary LanguagePythonMIT LicenseMIT

wirehead

Caching system for horizontal scaling of synthetic data generators using MongoDB


Installation

These instructions are different from the regular wirehead installation instructions due to the specific dependencies of SynthSeg

Instructions:

git clone git@github.com:neuroneural/wirehead.git
python3 -m venv venv 
source venv/bin/activate
pip install -e .
pip install -r requirements.txt

Run the test

cd examples/unit
chmod +x test.sh
./test.sh

Usage

See examples/unit for a minimal example

Manager:

from wirehead import WireheadManager

if __name__ == "__main__":
    wirehead_runtime = WireheadManager(config_path="config.yaml")
    wirehead_runtime.run_manager()

Generator:

import numpy as np
from wirehead import WireheadGenerator 

def create_generator():
    while True: 
        img = np.random.rand(256,256,256)
        lab = np.random.rand(256,256,256)
        yield (img, lab)

if __name__ == "__main__":
    brain_generator     = create_generator()
    wirehead_runtime    = WireheadGenerator(
        generator = brain_generator,
        config_path = "config.yaml" 
    )
    wirehead_runtime.run_generator()

Dataset:

import torch
from wirehead import MongoheadDataset

dataset = MongoheadDataset(config_path = "config.yaml")

idx = [0] 
data = dataset[idx]
sample, label = data[0]['input'], data[0]['label']

Config guide

All wirehead configs live inside yaml files, and must be specified when declaring wirehead manager, generator and dataset objects. For the system to work, all components must use the same configs.

Basic configs:

MONGOHOST -- IP address or hostname for machine running MongoDB instance
DBNAME -- MongoDB database name
PORT -- Port for MongoDB instance. Defaults to 27017
SWAP_CAP -- Size cap for read and write collections. bigger means bigger cache, and less frequent swaps. The total memory used by wirehead can be calculated with:
        SWAP_CAP * SIZE OF YIELDED TUPLE * 2

Advanced configs:

SAMPLE -- Array of strings denoting name of samples in data tuple. 
WRITE_COLLECTION   -- Name of write collection (generators push to this)
READ_COLLECTION    -- Name of read colletion (dataset reads from this)
COUNTER_COLLECTION -- Name of counter collection for manager metrics
TEMP_COLLECTION    -- Name of temporary collection used for moving data during swap
CHUNKSIZE          -- Number of megabytes used for chunking data

Generator guide

Wirehead's WireheadGenerator object takes in a generator, which is a python generator function. This function yields a tuple containing numpy arrays. The number of samples in this tuple should match the number of strings specified in SAMPLE in config.yaml

Example:

config.yaml:

SAMPLE: ["input", "label"]

generating script:

def create_generator():
    while True: 
        img = np.random.rand(256,256,256)
        lab = np.random.rand(256,256,256)
        yield (img, lab)

brain_generator = create_generator()
wirehead_runtime = WireheadGenerator(
    generator = brain_generator,
    config_path = "config.yaml" 
)
wirehead_runtime.run_generator() an infinite loop

Citation/Contact

This code is under MIT licensing

If you have any questions specific to the Wirehead pipeline, please raise an issue or contact us at mdoan4@gsu.edu