Raw Data Organization

Question

Raw Data Organization

sonyahanson opened this issue 8 years ago · 21 comments

Probably good to change how/where we are putting the raw data. I guess I have started doing this with the assaytools/data folder, where the data is separated out by data type (singlet, spectra, etc.). Though right now there are also snippets of analysis in there, which I should move.

Also, do we want to keep the data we are analyzing for the paper and the data we have for other projects (Mehtap, Andrea, etc.) in a different place? Possibly a different repository? Or is just a separate folder within assaytools/data sufficient?

Answer 1 · 2016-06-08T18:51:19.000Z

Or... should we not have data in this repository at all? Just have a bare minimum for tests? We were planning on putting all the raw data for the manuscript in the manuscript repository anyway...

Answer 2 · 2016-06-09T01:22:48.000Z

Here's my thinking:

assaytools should only contain (1) example data to illustrate how to use each capability, and (2) test data for us to include in nosetests.
All other data for a paper should be in a repository specific for that paper.

Answer 3 · 2016-06-09T22:50:09.000Z

I think this sounds reasonable, will shift to working in the manuscript repo for the datasets relevant to that. We can revisit if this becomes stranger than expected.

Answer 4 · 2016-06-09T23:18:12.000Z

I'm guessing we want to keep the files in data/full_example/, but maybe move everything to the examples directory. Maybe you can do this in your PR @jchodera ?

Answer 5 · 2016-07-28T16:01:29.000Z

So after discussing with @MehtapIsik, it seems like it is a good idea to make a new branch, e.g. assaytools/July2016 that could keep all the data and notebooks relatively as is, while we work on cleaning up the master branch. In the master branch we will remove/clean up any raw data and notebooks that don't serve as a minimal example.

Answer 6 · 2016-07-28T20:52:08.000Z

I copied all HSA experimental data in examples/.../hsa folder to organize it under another repository (hsa-affinity).

Answer 7 · 2016-07-29T13:44:04.000Z

I'm guessing we want to keep the files in data/full_example/, but maybe move everything to the examples directory. Maybe you can do this in your PR @jchodera ?

Let's leave this for now since I use that data in my new implementation. We can clean it up next week.

So after discussing with @MehtapIsik, it seems like it is a good idea to make a new branch, e.g. assaytools/July2016 that could keep all the data and notebooks relatively as is, while we work on cleaning up the master branch. In the master branch we will remove/clean up any raw data and notebooks that don't serve as a minimal example.

Sounds great!

Answer 8 · 2016-07-29T16:45:04.000Z

Great Mehtap! thanks!

Answer 9 · 2016-11-10T23:26:41.000Z

Just had a quick meeting with @MehtapIsik where we mapped out a better organization for the examples directory (among other things):

autoprotocol
- data
- ~four python scripts
probe assay
- data
- ~three ipynb's
  - modeling
  - MLE
  - simple bayes model
  - README also describes how to use xml2png and quickmodel for this data
competition assay
- data
- ~three ipynb's
  - modeling
  - MLE
  - simple bayes model
  - README also describes how to use xml2png and quickmodel for this data

Also note, according to the discussion above the plan is to completely delete the data directory.

Also planning to make a branch that will just be the repo as it currently is called 'Nov2016'.

Answer 10 · 2016-11-10T23:43:57.000Z

Sounds good!

Answer 11 · 2016-11-11T17:26:58.000Z

Branch now made: https://github.com/choderalab/assaytools/tree/Nov2016

Answer 12 · 2016-11-11T23:34:11.000Z

@jchodera do you have any opinion about whether this notebook stays: https://github.com/choderalab/assaytools/blob/master/examples/ipynbs/models/competition-assay-modeling/competition-assay-modeling.ipynb

Answer 13 · 2016-11-11T23:38:07.000Z

What if we move it to https://github.com/choderalab/fluorescence-assay-manuscript in case we use a derivative of it to model some fluorescence assays?

Answer 14 · 2016-11-11T23:39:26.000Z

if we use a derivative to model fluorescence assays we can just include that no?

Answer 15 · 2016-11-11T23:40:51.000Z

Do you have it somewhere else? I feel like maybe it is?

Answer 16 · 2016-11-11T23:42:06.000Z

I don't believe there is a copy somewhere else.

Since it's not an example of analyzing an experimental assay, I think we should add it to https://github.com/choderalab/fluorescence-assay-manuscript, perhaps under a notebooks/ or modeling/ or figures/ directory, and delete it from here.

Answer 17 · 2016-11-11T23:45:46.000Z

I think it makes less sense in fluorescence-assay-manuscript than here, I will just keep it here.

Answer 18 · 2016-11-11T23:47:33.000Z

I have a derivative that I have used in the passed, and will add here when we get something going for the simpler competition assay prediction. If these two notebooks are redundant, we can delete one.

Answer 19 · 2016-11-11T23:47:39.000Z

Our fluorescence assay manuscript outline contains a figure on modeling the competition assay, so I thought it was much more relevant to have this there---where we actually need to make figures depicting a modeled competition assay---than in this repo, which contains examples of real data and scripts/notebooks to analyze them. But I'm happy with whatever you think is best!

Answer 20 · 2016-11-12T00:09:24.000Z

We have both the modeling and analysis here right now, and I think this makes sense for testing our methods. What do you think of this: https://github.com/choderalab/assaytools/tree/data_clean/examples/probe-assay ? We can add more description in the readme about what actually happens in these notebooks:

Answer 21 · 2016-11-12T00:19:30.000Z

Looks good!

Would definitely appreciate a bit more description about the assays that are described in more detail in the notebooks. Think of yourself as a potential user looking to try to figure out if these examples are the most similar to what you are trying to do: what would you want to see?