/takefive

Unfolding the genome: the case study of P. falciparum

Primary LanguagePython

# Unfolding the genome: the case study of P. falciparum

by Nelle Varoquaux

This git repository contains all the scripts to reproduce the results and
figures of the paper entitled "Unfolding the genome: the case study of P.
falciparum".

The repository is organized following the guidelines of "A Quick Guide to
Organizing Computational Biology Projects".

  - the *data* lies into the folder called ``data``. To download it
    automatically, just go in that folder and type make. Else, make sure to
    follow the README in this particular folder to organize the data properly.
  - the `script` folder contains all the scripts necessary to reproduce the
    figures of the article. Precisely,
      - "Figure 3: contact maps of P. falciparum's chr7" can be reproduced by
        running `figure_counts.py` in `scripts/visualizing_counts`.
      - "Figure 4:" 3D models of P. falciparum's genome architecture"'s code
        is contained in `scripts/structures` and can be
        reproduced by:
	- Inferring the MDS structures for ay-trophozoites and lemieux-B15C2::

	  python infer_structures_mds.py \
	      --lengths data/ay2013/trophozoites_10000_raw.bed \
	      data/ay2013/trophozoites_10000_raw.matrix
	  python infer_structures_mds.py \
	      --lengths data/lemieux2013/25kb/B15C2_combined_raw.bed \
	      data/lemieux2013/25kb/B15C2_combined_raw.matrix


	- Inferring the PO structure for the same data sets::

	  python infer_structures_po.py \
	      --lengths data/ay2013/trophozoites_10000_raw.bed \
	      data/ay2013/trophozoites_10000_raw.matrix
	  python infer_structures_po.py \
	      --lengths data/lemieux2013/25kb/B15C2_combined_raw.bed \
	      data/lemieux2013/25kb/B15C2_combined_raw.matrix

	- Visualize the structures::

	   python plot_results.py \
	    results/lemieux2013/25kb/B15C2_combined_raw_PO_01_structure.txt \
	    --lengths data/lemieux2013/25kb/A44_combined_raw.bed \
  	    --resolution 25000
	   python plot_results.py \
	    results/ay2013/trophozoites_10000_raw_PO_01_structure.txt \
	      --lengths data/lemieux2013/25kb/B15C2_combined_raw.bed \
	      --resolution 10000
      - "Figure 5: Volume Exclusion Modeling" can be reproduced from the
	`scripts/volume_exclusion` folder using the following steps.
	- Generating a large number of VE-structures using the following
	  command::

	    python plasmo_landmark_without_VRSM.py \
	      -s troph \
	      SEED

	 where SEED is a the seed number. In the paper, I used seeds ranging
	 from 1 to 5000. Count 1 hour per structures. A slurm script is
	 avalailable in cluster_scripts. More information in
	 `scripts/volume_exclusion/README`.
	- Then, run::

	    python create_and_plot_ve_counts.py
      - "Figure 6: Stability of structures across the life cycle"'s code is in
	`scripts/population_structures`. But first, 1000 structures for
	ay-ring, ay-troph, ay-schizont and lemieux-B15C2 need to be computed.
	I can provide the set of structures upon demand. Then:
	- Create the feature matrices::

	  python compute_downsampled_distances.py \
	    structures/ay2013/rings_10000_raw_PO \
	    --lengths data/ay2013/rings_10000_raw.bed

	  python compute_downsampled_distances.py \
	    structures/ay2013/schizonts_10000_raw_PO \
	    --lengths data/ay2013/rings_10000_raw.bed

	  python compute_downsampled_distances.py \
	    structures/ay2013/trophozoites_10000_raw_PO \
	    --lengths data/ay2013/rings_10000_raw.bed

	  python compute_downsampled_distances.py \
	    structures/lemieux2013/25kb/B15C2_combined_raw_PO \
	    --factor 4 \
	    --lengths data/lemieux2013/25kb/DCJ_On_combined_raw.bed

      - And compute the stochastic PCA::
	
	python clustering_across_stages_downsampled_distances.py -a PO



TODO
- extract useful code from minorswing and add it either to pastis or iced
- make sure anyone can download structures