Pipelines: A Python repository from j-andrews7

Pipelines

Because backups and revision history are good things.

This repository contains all of the code and documentation needed for various pipelines I've run/developed/warped to my own needs. They are likely not perfect - some were hacked together in a matter of hours, others were repeatedly tweaked over the course of weeks or months. Regardless, the documentation for each should be comprehensive.

This serves as a record for myself and the lab. Though anyone is free to use them for their own needs, you do so at your own risk. As anyone who works with big data know, consistency is key, and this fact guided the design of many of these pipelines - tools used, etc. Using the newest/best tools/methods is, of course, ideal, but reprocessing is a barrier and timesink that you should consider.

I am currently in the process of trying to simplify/streamline many of these. For some of these, 4 separate people worked on them at various times throughout the years, and extra work is sometimes done trying to cobble the pieces together. I'm cutting out/combining steps whenever I get the chance so that there isn't so much data scrubbing and general formatting necessary.

These are usually fairly up to date, but there may be times where I alter something and don't change it here immediately. In addition, I often make slight tweaks/variations on these as I run through them that may or may not get mentioned/logged here.

Some of the scripts may be old/one-offs. Most scripts included in the actual pipelines should be fairly robust.

j-andrews7/Pipelines

Pipelines