mdibl/biocore_documentation

Thoughts on changes - Lucie

Opened this issue · 0 comments

I would go with /biocore/scratch/reference/ensembl/release-93/... instead of /biocore/ref_expanded/ensembl/release-93/... for consistency. However, since the proposed changes will affect all parts of our automation including, the cloud image, data downloads ,... I would suggest we do this incrementally.

rename /data/internal to /data/raw_data
rename /data/projects to /data/analysis
rename /data/external to /data/reference
re-organize the expended reference data under /data/scratch to /data/scratch/reference
replicate this in the cloud
run regression testing
data downloads automation
pipeline analysis
Re organize /data/scratch/reference to be by source_name/release-version
Re organize the information under /data/transformed/tool-version to be by source_name/release-version
Update the reference pre-indexing automation
run regression testing for the reference pre-indexing automation
Proceed with changing /data to /biocore
Update the /data mount in the cloud and generate a new image
I personally do not see the need to rename /data to /biocore for the following reasons:

It is redundant since we are using biocore servers
More hassle to run the same application on both the premise and cloud servers - need more configuration
Not portable - see 2
I also do not think raw_data is intuitive and representative for the experimental data - I would suggest "experiments" instead which requires less cognitive load since it uses recognition instead of recall .

Those are my two cents