eMASS-MD
Here you can find all code used in chapter 4 of my PhD thesis together with some data. The remaining data will soon be uploaded to Zenodo.
Overview
eMASS-MDpy
Contains all Python code used to analyse the data and produce the plots in the thesis.
enzyme_models
Contains all the code to generate the enzyme-level kinetic models and some of the generated data.
MD_data
Constains some of the Molecular Dynamics data.
The code has only been tested on Linux Mint 17.3. To build the enzyme-level kinetic models we used Mathematica 11.0 and the following packages
To analyse the results, the code in eMASS-MDpy was used, and the package requirements can be found there.
Reproducing results
To reproduce the parameter scan, and in particular figure 4.5 in the thesis
To reproduce the plot based on the existing data:
Run plot_parameter_scan.py in eMASS-MDpy/src/kinetics_integration. Here you need to change the main_dir
variable to match the folder where you have all the enzyme-level models data (at the bottom of the file).
The output plot will be stored in eMASS-MD/enzyme_models/plots.
To reproduce the plot from scratch:
- use the respective enzyme Mathematica notebook: enzyme_models/enzyme_name/enzyme_name.nb
- in the notebook change
pathMASSef
to where your MASSef folder is (needed to get the enzyme data properly) - change mainFolder to "enzyme_name_param_scan"
- run sections:
- "Initialize notebook";
- "Import Data";
- "Set up rate equations";
- "Parameter scan" under "Simulate Data";
- "Configure the Particle Swarm Optimization and Levenberg-Marquardt Algorithm";
- now you've generated the data to be fitted, and you can either run the fits through the notebook by running section "Run the Fitting Algorithms", or copy the data to a cluster and run the fits there (probably a better idea, as it can take some time), to do so:
- in the cluster create a folder where you will store all data, scripts, etc., e.g. kinetics_fit
- inside kinetics_fit create a folder named scripts to where you will copy lma.py, pso.py, swarm.py, and run_fit.py (these files are in MASSef/python_code/src);
- inside kinetics_fit create a folder named enzyme_name_param_scan, and inside of it create two folders: "data" and "results";
- copy all files generated by running the Mathematica notebook in enzyme_name_param_scan/input to kinetics_fit/enzyme_name_param_scan/data;
- copy the file "set_dir.sh" in eMASS-MD to the cluster;
- change the following variables: enzyme_name and main_folder_local/cluster to match the folder where the folders where the data is in your local machine and in the cluster, respectively. Also change the variables enzyme_subfolder and enzyme_folder_cluster to end in param_inf
- run "set_dir.sh" and double check that the paths in the .dat and pso/lmaParameters.txt files (in kinetics_fit/enzyme_name_param_scan/data) have been updated;
- copy the file prepare_run_fit_script.py in eMASS-MDpy/src/kinetics_integration to the cluster, this file generates bash scripts that will run the fitting jobs in the cluster;
- change "prepare_run_fit_script.py" to work in your system, in particular you'll need to change the following variables: "main_dir", "python_packages_path", and in "_write_header()" the job header, e.g. the queue you're using to run jobs;
- at the bottom of "prepare_run_fit_script.py" choose the function you want to run:
run_enzyme_name_param_scan_ensemble()
;
- To analyze the results, go back to the Mathematica notebook. If you ran the fits on the cluster, just copy all files in enzyme_name_param_scan/results to your local folder enzyme_name_param_scan/output/raw.
- Run the section "Evaluate fit results for param_scan" (you will also need to run the first 3 sections in the notebook), this will calculate the sum of squared errors for each fit. Here you can ignore the subsections "Parameter Distribution", "Data Error Distribution", and "Recalculate Enzyme Parameters for all Candidate Rate Constant Sets" - these are not up to date.
- Finally, to reproduce figure 4.5 in the thesis, run plot_parameter_scan.py in eMASS-MDpy/src/kinetics_integration. Here you need to change the "main_dir" variable to match the folder where you have all the enzyme-level models data (at the bottom of the file). The output plot will be stored in eMASS-MD/enzyme_models/plots.
To reproduce figures 4.6-4.8
To reproduce the plots based on the existing data:
For each enzyme, go to eMASS-MDpy/src/run_kinetic_analyses and run enzyme_name.py file. Here, you need to change the variable base_folder (at the bottom of the file). It might take a while to run.
The output plots will be stored in eMASS-MD/enzyme_models/enzyme_name/enzyme_name_param_inf/output in the following folders: plots, entropy, and clustermaps.
To reproduce the plot from scratch:
- use the respective enzyme Mathematica notebook: enzyme_models/enzyme_name/enzyme_name.nb
- in the notebook change pathMASSef to where your MASSef folder is (needed to get the enzyme data properly)
- change mainFolder to "enzyme_name_param_inf"
- run sections:
- "Initialize notebook";
- "Import Data";
- "Set up rate equations";
- "Parameter influence" under "Simulate Data";
- "Configure the Particle Swarm Optimization and Levenberg-Marquardt Algorithm";
- now you've generated the data to be fitted, and you can either run the fits through the notebook by running section "Run the Fitting Algorithms", or copy the data to a cluster and run the fits there (probably a better idea, as it can take some time), to do so:
- in the cluster create a folder where you will store all data, scripts, etc., e.g. kinetics_fit
- inside kinetics_fit create a folder named scripts to where you will copy lma.py, pso.py, swarm.py, and run_fit.py (these files are in MASSef/python_code/src);
- inside kinetics_fit create a folder named "enzyme_name_param_inf", and inside of it create two folders: "data" and "results";
- copy all files generated by running the mathematica notebook in enzyme_name_param_scan/input to kinetics_fit/enzyme_name_param_scan/data;
- copy the file "set_dir.sh" in eMASS-MD to the cluster;
- change the following variables: enzyme_name and main_folder_local/cluster to match the folder where the folders where the data is in your local machine and in the cluster, respectively. Also change the variables enzyme_subfolder and enzyme_folder_cluster to end in param_inf;
- run "set_dir.sh" and double check that the paths in the .dat files and pso/lmaParameters.txt have been updated;
- copy the file "prepare_run_fit_script.py" in eMASS-MDpy/src/kinetics_integration to the cluster, this file generates bash scripts that will run the fitting jobs in the cluster;
- change "prepare_run_fit_script.py" to work in your system, in particular you'll need to change the following variables: "main_dir", "python_packages_path", and in "_write_header()" the job header, e.g. the queue you're using to run jobs;
- at the bottom of "prepare_run_fit_script.py" choose the function you want to run: run_enzyme_name_param_influence();
- To analyze the results, go back to the Mathematica notebook. If you ran the fits on the cluster, just copy all files in enzyme_name_param_scan/results to your local folder enzyme_name_param_scan/output/raw.
- Run the section "Evaluate fit results for param_inf" (you will also need to run the first 3 sections in the notebook), this will calculate the sum of squared errors for each fit. Here you can ignore the subsections "Parameter Distribution", "Data Error Distribution", and "Recalculate Enzyme Parameters for all Candidate Rate Constant Sets".
- For each enzyme, go to eMASS-MDpy/src/run_kinetic_analyses and run enzyme_name.py file. Here, you need to change the variable base_folder (at the bottom of the file). It might take a while to run. The output plots will be stored in eMASS-MD/enzyme_models/enzyme_name/enzyme_name_param_inf/output in the following folders: plots, entropy, and clustermaps.
To reproduce figures 4.9-4.11:
Assuming you have completed the previous section, in this part you need to either generate the enzyme-level kinetic models and simulate them using the notebooks model_building.nb and model_simulation.nb or download the data from zenodo. After that, for each enzyme, go to eMASS-MDpy/src/enzyme_kinetics and run enzyme_name.py file. Here, you need to change the variable base_folder and set the boolean variable time_courses to True and the others to False (otherwise it's going to run the analyses in the previous section). It takes a while to run. The output plots will be stored in eMASS-MD/enzyme_models/enzyme_name/enzyme_name_param_inf/output/model_simulations/plots.
If you want to build the models and simulate them from scratch, for each enzyme:
- go to the folder eMASS-MD/enzyme_models/enzyme_name;
- open model_building.nb, change the variable enzModelsDir to match the directory where you have eMASS-MD;
- run the notebook and wait for all models to be generated, it takes a while.
- open model_simulation.nb, change the variable enzModelsDir to match the directory where you have eMASS-MD;
- run the notebook and wait for all models to be simulated and the data stored, it takes a while.
Contact: marta.ra.matos@gmail.com