The goal of FCMcmp is to make it easy to analyze flow cytometry data from python. The first challenge in analyzing flow cytometry data is working out which wells should be compared with each other. For example, which wells are controls, which wells are replicates of each other, which wells contain the conditions you're interested in, etc. This isn't usually too complicated for individual experiments, but if you want to write analysis scripts that you can use on all of your data, managing this metadata becomes a significant problem.
FCMcmp addresses this problem by defining a simple YAML file format that can associate wells and plates in pretty much any way you want. When you ask FCMcmp to parse these files, it returns a list- and dictionary-based data structure that contains these associations, plus it automatically parses the raw FCS data into pandas data frames.
fcmcmp
is available on PyPI:
pip3 install fcmcmp
Only python>=3.4 is supported.
I'll demonstrate using data you might export from running a 96-well plate on a BD LSRII, but the library should be pretty capable of handling any directory hierarchy:
my_plate/ 96 Well - U bottom/ Specimen_001_A1_A01_001.fcs Specimen_002_A2_A02_002.fcs Specimen_003_A3_A03_003.fcs ...
First, we need to make a YAML metadata file describing the relationships between the wells on this plate:
# my_plate.yml label: vaxadrin wells: without: [A1,A2,A3] with: [B1,B2,B3] --- label: vaxamaxx wells: without: [A1,A2,A3] with: [C1,C2,C3]
In this example, the name of the plate directory is inferred from the name of
the YAML file. You can also explicitly specify the path to the plate directory
by adding the following header before the label
/wells
sections:
plate: path/to/my_plate ---
You can even reference wells from multiple plates in one file:
plates: foo: path/to/foo_plate bar: path/to/bar_plate --- label: vaxascab wells: without: [foo/A1, foo/A2, foo/A3] with: [bar/A1, bar/A2, bar/A3]
Note that the label
and wells
fields are required, but you can add,
remove, or rename any other field:
label: vaxa-smacks channel: FITC-A gating: 60% wells: 0mM: [A1,A2,A3] 1mM: [B1,B2,B3] 5mM: [C1,C2,C3]
Once you have a YAML metadata file, you can use fcmcmp
to read it:
>>> import fcmcmp, pprint >>> experiments = fcmcmp.load_experiments('my_plate.yml') >>> pprint.pprint(experiments) [{'label': 'vaxadrin', 'wells': {'with': [Well(B1), Well(B2), Well(B3)], 'without': [Well(A1), Well(A2), Well(A3)]}}, {'label': 'vaxamaxx', 'wells': {'with': [Well(C1), Well(C2), Well(C3)], 'without': [Well(A1), Well(A2), Well(A3)]}}]
The data structure returned is little more than a list of dictionaries, which
should be easy to work with in pretty much any context. The wells are
represented by Well
objects, which have only three attributes:
Well.label
: The name used to reference the well in the YAML file.Well.data
: Apandas.DataFrame
containing all the data associated with the well, parsed using the excellentfcsparse
library.Well.meta
: A dictionary containing any metadata associated with the well, also parsed usingfcsparse
.
Note that if you reference the same well more than once (e.g. for controls that apply to all of your experiments), each reference is parsed separately and gets its own copy of all the data.
Once the experiments are loaded into python as described above, fcmcmp
provides a couple ways to interact with them. The first is to apply one or
more of a handful of pre-defined "processing steps":
>>> ch = 'FITC-A', 'PE-Texas Red-A' >>> p1 = fcmcmp.GateEarlyEvents(throwaways_secs=2) >>> p1(experiments) >>> p2 = fcmcmp.GateSmallCells(threshold=40, save_size_col=True) >>> p2(experiments) >>> p3 = fcmcmp.GateNonPositiveEvents(ch) >>> p3(experiments) >>> p4 = fcmcmp.LogTransformation(ch) >>> p4(experiments) >>> p5 = fcmcmp.KeepRelevantChannels(ch) >>> p5(experiments)
In this example:
GateEarlyEvents
discards the first few seconds of data, which is useful when you're using a high-throughput sampler and you suspect that cells from the previous well are being recorded at the beginning of each well.GateSmallCells
combines theFSC-A
andSSC-A
channels to estimate the size of each event, then discards any events below the given percentile (40% in this example).GateNonPositiveEvents
discards negative data on the specified channels. I have to admit that I don't understand how "fluorescence peak area" data can be negative, but in any case this can be important if you want to work with the logarithm of your data, because of course you can't take the logarithm of negative data.LogTransform
takes the logarithm of the data in the specified channels. This is a very standard processing step for fluorescent channels.KeepRelevantChannels
discards all the data for any channels that aren't explicitly listed. This is mostly useful for when you're printing out data to the terminal and don't want to be distracted by channels you collected but aren't interested in at the moment.
Instead of calling each processing step individually, you can also use the
run_all_processing_steps()
function to call them all at once. If you do
this, you don't even need to make a variable for each step:
>>> fcmcmp.GateEarlyEvents(throwaways_secs=2) >>> fcmcmp.GateSmallCells(threshold=40, save_size_col=True) >>> fcmcmp.GateNonPositiveEvents(ch) >>> fcmcmp.LogTransformation(ch) >>> fcmcmp.KeepRelevantChannels(ch) >>> fcmcmp.run_all_processing_steps()
You can also write your own processing steps by inheriting from either
ProcessingStep
or GatingStep
and reimplementing the proper methods.
ProcessingStep
is for general transformations and has two virtual methods:
process_experiment()
and process_well()
. The former is called once for
each experiment and should transform that experiment in place. The latter is
called once for each well and can either modify the well in place (and return
None) or return the processed data, which will overwrite the original data.
GatingStep
is specifically for transformations regarding which data points
to keep and which to throw out. It is itself a ProcessingStep
, but it has
a different virtual method: gate()
. This method is called on each well and
should return a boolean numpy array. Those indices that are False
will be
thrown out, those that are True
will be kept.
The second way to interact with the experiments is to use the yield_wells()
and yield_unique_wells()
functions. These are both generators which
iterate through all of your experiments and yield each well one at a time. The
purpose of these functions is to make the nested experiments
data structure
seem more like a flat list:
>>> for experiment, condition, well in fcmcmp.yield_wells(experiments): >>> print(experiment, condition, well)
Both functions take an optional keyword argument. If given, only wells with a
matching experiment label, condition, or well label will be returned. The only
difference between yield_wells()
and yield_unique_wells()
is that the
former won't yield the same well twice. This is important because the same
well can be included in any number of experiments.
It's worth noting that fcmcmp
does not provide any plotting functionality.
That's because fcmcmp
only aims to annotate your data and to provide it to
you in a way that's easy to work with. The actual plotting is left to you and
matplotlib
.
Use the GitHub issue tracker if you find any bugs or would like to see any new features. I'm also very open to pull requests.