choderalab/assaytools

Raw Data Organization

sonyahanson opened this issue · 21 comments

Probably good to change how/where we are putting the raw data. I guess I have started doing this with the assaytools/data folder, where the data is separated out by data type (singlet, spectra, etc.). Though right now there are also snippets of analysis in there, which I should move.

Also, do we want to keep the data we are analyzing for the paper and the data we have for other projects (Mehtap, Andrea, etc.) in a different place? Possibly a different repository? Or is just a separate folder within assaytools/data sufficient?

Or... should we not have data in this repository at all? Just have a bare minimum for tests? We were planning on putting all the raw data for the manuscript in the manuscript repository anyway...

Here's my thinking:

  • assaytools should only contain (1) example data to illustrate how to use each capability, and (2) test data for us to include in nosetests.
  • All other data for a paper should be in a repository specific for that paper.

I think this sounds reasonable, will shift to working in the manuscript repo for the datasets relevant to that. We can revisit if this becomes stranger than expected.

I'm guessing we want to keep the files in data/full_example/, but maybe move everything to the examples directory. Maybe you can do this in your PR @jchodera ?

So after discussing with @MehtapIsik, it seems like it is a good idea to make a new branch, e.g. assaytools/July2016 that could keep all the data and notebooks relatively as is, while we work on cleaning up the master branch. In the master branch we will remove/clean up any raw data and notebooks that don't serve as a minimal example.

I copied all HSA experimental data in examples/.../hsa folder to organize it under another repository (hsa-affinity).

I'm guessing we want to keep the files in data/full_example/, but maybe move everything to the examples directory. Maybe you can do this in your PR @jchodera ?

Let's leave this for now since I use that data in my new implementation. We can clean it up next week.

So after discussing with @MehtapIsik, it seems like it is a good idea to make a new branch, e.g. assaytools/July2016 that could keep all the data and notebooks relatively as is, while we work on cleaning up the master branch. In the master branch we will remove/clean up any raw data and notebooks that don't serve as a minimal example.

Sounds great!

Great Mehtap! thanks!

Just had a quick meeting with @MehtapIsik where we mapped out a better organization for the examples directory (among other things):

  • autoprotocol
    • data
    • ~four python scripts
  • probe assay
    • data
    • ~three ipynb's
      • modeling
      • MLE
      • simple bayes model
      • README also describes how to use xml2png and quickmodel for this data
  • competition assay
    • data
    • ~three ipynb's
      • modeling
      • MLE
      • simple bayes model
      • README also describes how to use xml2png and quickmodel for this data

Also note, according to the discussion above the plan is to completely delete the data directory.

Also planning to make a branch that will just be the repo as it currently is called 'Nov2016'.

Sounds good!

What if we move it to https://github.com/choderalab/fluorescence-assay-manuscript in case we use a derivative of it to model some fluorescence assays?

if we use a derivative to model fluorescence assays we can just include that no?

Do you have it somewhere else? I feel like maybe it is?

I don't believe there is a copy somewhere else.

Since it's not an example of analyzing an experimental assay, I think we should add it to https://github.com/choderalab/fluorescence-assay-manuscript, perhaps under a notebooks/ or modeling/ or figures/ directory, and delete it from here.

I think it makes less sense in fluorescence-assay-manuscript than here, I will just keep it here.

I have a derivative that I have used in the passed, and will add here when we get something going for the simpler competition assay prediction. If these two notebooks are redundant, we can delete one.

Our fluorescence assay manuscript outline contains a figure on modeling the competition assay, so I thought it was much more relevant to have this there---where we actually need to make figures depicting a modeled competition assay---than in this repo, which contains examples of real data and scripts/notebooks to analyze them. But I'm happy with whatever you think is best!

We have both the modeling and analysis here right now, and I think this makes sense for testing our methods. What do you think of this: https://github.com/choderalab/assaytools/tree/data_clean/examples/probe-assay ? We can add more description in the readme about what actually happens in these notebooks:

Looks good!

Would definitely appreciate a bit more description about the assays that are described in more detail in the notebooks. Think of yourself as a potential user looking to try to figure out if these examples are the most similar to what you are trying to do: what would you want to see?