choderalab/assaytools

Suggestions for API improvements to facilitate competition experiment analysis

Opened this issue · 2 comments

From our discussion today, here is what we decided we would need for input:

  • Compound stock dictionary. A master data structure listing
compound_stocks = [
   { 'id' : 'BOS001',
     'compound_name' : 'bosutinib',
     'compound_mw' : 530.446, # g/mol
     'compound_mass' : 12.45, # mg compound dispensed
     'compound_purity' : 0.96, # purity
     'solvent_mass' : 12.52, # g solvent dispensed
   }
...
]

The compound stock dictionary could be used for all experiments, and would eventually be retrieved from a lightweight database running on Amazon EC2 or someplace.

  • D300 XML file specifying compound and DMSO concentrations in each well
  • For each assay plate:
    • A list of compound stock ids for used for the rows (e.g. ['BOS001', 'GEF002', 'ERL001', 'BSI004'])
    • The protein name pipetted into rows and its stated concentration
    • The Infinite XML file after the plate has been read
  • An error assumption dictionary which would contain
    • D300 dispense CVs
    • protein concentration uncertainties
    • any other assumptions that go into error modeling

For the interested, this document has some good statistics about the reliability of protein concentrations as a function of measured absorbance. CV seems to be as high as 13.1% for low conc (0.227 mg/mL), 3.7% for med-low (0.898 mg/mL), and better than 1% for high (>2 mg/mL).

We'll need to collect at least one example of all of this data for development and testing.

Would be useful to have someone (@sonyahanson @Lucelenie @MehtapIsik) check in such an example, especially a coherent set of

  • which compound stock id was used in each row
  • HP D300 XML files
  • Infinite XML files from the plate read
  • protein info and stated concentration

I think I can generate the rest!