README for simulac RMM, 10 Mar 2010 Simulac is a highly detailed stochastic simulation package for simulating core processes in a cell, including interaction of transcription factors with RNA polymerase, isomerization, mRNA elongation, ribosome binding (to full and partial transcripts), polypeptide elongation, chemical reactions, cell growth and many other stochastic effects inside cells. The simulator was written by Adam Arkin (see original README file below) and modified by Richard Murray. Installation ------------ Simulac is written in standard ANSI C and should compile on most variants of unix using the GNU autoconf package. It has been tested on Mac OS X, linux and Windows (using cygwin). To install: configure make make install Run 'configure --help' for information on command line options that can be used to tune the compilation and installation options. Usage Information ----------------- The simulation works by reading an "outline file" that contains the information that defines the system. The format of this file is described in detail in the original README (at the bottom of the file). The command to run a simulation is Simulac [options] outline_file maxtime printtime [seed] The arguments are: * outline_file - name of a text file containing system description * maxtime - amount of time to run the simulation, in seconds * printtime - period to use for printing the state, in seconds * seed - random number seed (if you want to generate a repeatable execution) The output of the simulation is a list of datapoints for each time point (printtime) in the simulation. The first line in the output is a header row that lists what is in each column. Error messages are printed on stderr and simulation output on stdout, so you can redirect stdout to a data file. A number of options are available to change the run-time behavior of the system: -V, --version Print version and exit --help print out help message and exit -i, --init=STRING initial condition, name=val (int) -r, --rate=STRING rate parameter, kNN=val (float) -v, --volume=FLOAT initial cell volume, (0-1] -g, --growth=FLOAT cell growth rate, (0-1] -h, --header print header row in output file (default=on) -H, --long-header print long header row output file (default=off) -m, --moi=INT multiplicity of infection -s, --single single cell mode; turn off cell division (default=off) -p, --pops output transcription initiations (default=off) The options should be fairly self explanatory. Cell volume and growth rate are fractional amounts based on the nominal parameters defined in the system description (so setting these to 1 does nothing). The MOI parameter is currently a bit of a hack: it modified all files with MOI != 1 to have the given MOI. More detailed information is given in the Simulac documentation. Lambda example -------------- A detailed example of Simulac is contained in the directory examples/lambda. The following MATLAB scripts can be used to generate simulation data and plots: * gensims_vol - generate simulations with different cell volumes * arkin_fig3 - plot data similar to Arkin et al, Figure 3 See the README file for more information on the contents of the example, as well as the Simulac documentation. ------------------------------------------------------------------------------ Original README (from Adam Arkin) ------------------------------------------------------------------------------ This is the beta version of the kernel for BIO/SPICE. The kernel is called simulac. This particular kernel is an implementation of the Gillespie Monte Carlo simulation algorithm for the Chemical Master Equation. As such it is exact. However it is slow. Anyway-- this code is offered as is with no warranty implied or otherwise. This code is designed for use in a UNIX environment. However, it is written in standard C so there should be nothing special to do in order to port it to other platforms. === Install === 1) Uncompress the archive file. 2) Type make. The make utility should create an executable called Simulac. === Running Simulac === Well, this is a complicated business at the moment. The upcoming graphical front-end should make this a lot easier. The file structure for input files to this version of the simulator is as follows: An 'outline' file containing the file names of a 'Cel' file (hold parameters for cell volume and growth), 'Kin' files that contain basic chemical kinetics, and 'Seq' files that contain sequence based information. 'Seq' files will point to other files that I will describe in a moment. Here is a representative outline file. Lines prefixed with '%' are always comments and are always ignored by the parser. The order of the files in the outline file shouldn't matter...but usually I do 'Cel' then 'Kin' then 'Seq'. Initial concentration s in the last Kin file are dominant. <tt> % % This file outlines the mechanisms % needed to model much of the lambda phage % % Parameter set 12.0//mRNA Cel.EColi Kin.Lambda Kin.CIICIII Kin.RNAP Kin.Ribosome Seq.LambdaPL Seq.LambdaPRPAQ </tt> A 'Cel' file looks like: <tt> % E. Coli Cell Parameters % % Volumes in liters % % GrowthRate in 10^-18 liters/sec % TYPE= Cell ------- VI= 1e-15 V0= 1.41e-15 GrowthRate= 0.4762 </tt> 'VI' is the initial volume of the cell, V0 is the reference volume used in calculation of the free energies, etc., and Growthrate is the growthrate (in units of 10^-18L) . In our model, when the volume doubles the cell divides. Very primitive. Also-- growth is stochastic and linear. <tt> % CII/CIII/Hfl protein Kinetics % % PSet 14.1//Hfl TYPE= Kinetic ------- CII + Hfl --> Hfl_CII Hfl_CII --> CII + Hfl Hfl_CII --> Hfl CIII + Hfl --> Hfl_CIII Hfl_CIII --> CIII + Hfl Hfl_CIII --> Hfl CII + HflB --> HflB_CII HflB_CII --> CII + HflB HflB_CII --> HflB CIII + HflB --> HflB_CIII HflB_CIII --> CIII + HflB HflB_CIII --> HflB () --> Hfl () --> HflB ------- k1 = 0.01 sec k2 = 0.01 sec k3 = 0.015 sec k5 = 0.05 sec k6 = 0.001 sec k7 = 0.0001 sec k9 = 0.0001 sec k10 = 0.065 sec k11 = 0.6 sec k9 = 0.01 sec k10 = 0.01 sec k11 = 0.001 sec k13 = 0.0116 sec k15 = 0.0465 sec ---- CII = 0 Hfl = 25 Hfl_CII = 0 CIII = 0 Hfl_CIII = 0 HflB = 100 HflB_CII = 0 </tt> Here the rate constants 'k' are measure in units of the appropriate molarity times inverse seconds. The intial concentrations on the bottom is in molecules. 'Seq' files look like: <tt> % % Simple Lambda Approximation % % TL1 Large since it covers orf ea10 and ra1 % TL2 Large since it covers a bunch o' genes as well. % % PSet 3.4//NT TYPE= DNA ------- PL --> NUTL --> GeneN --> TL1 --> GeneCIII ------- MOI = 13 PL 60 nt RIGHT Promotor PromDataPL NUTL 73 nt RIGHT AntiTerminator ATermDataNUTL GeneN 481 nt RIGHT Gene ProtDataN TL1 1574 nt RIGHT Terminator TermDataTL1 GeneCIII 164 nt RIGHT Gene ProtDataCIII </tt> Each DNA region is linked to the next with a '-->'. There is one line of parameters for each region. For example, PL is 60 nucleotides long. It is transcribed to the RIGHT (this is arbitrary in this case-- there are no counter-propagating transcripts to worry about). It is a promoter (yes-- I spelled it wrong in the parser and never went back to change it) and the data on the promoter is stored in PromDataPL. The three types of DNA region data files are as follows: 'PromData', 'ProtData', 'TermData' and 'ATermData'. PromData files look like: <tt> % % Promotor Data File % % TranscriptionDirection = RIGHT SheaAckers = OperatorPL IsoData = IsoDataPL </tt> They contain a superceding transcription direction, a pointer to a shea-ackers style operator thermodynamics file and CC-->OC rate file. The 'Operator' file looks like: <tt> 2 10 ___ ___ -0.0 CroCro ___ -10.8 ___ CroCro -12.1 CICI ___ -11.7 ___ CICI -10.1 ___ RNAP -12.5 CroCro CroCro -22.9 CroCro CICI -20.9 CICI CroCro -22.8 CICI CICI -23.7 </tt> Where the first two number are the number of binding sites and the number of configurations. This is followed by a list of configurations where molecular names are placed in the relevant columns (one column for each functional binding site) If in a given configuration nothing binds to a particular site, then the symbol '___' is used there. The last column is always the free energy of that site. There are definitely more effficient ways of storing this info but this one is easy to look at. 'IsoData files are boring and are merely a list of isomerization rates for each of the states in the Operator file. Why these two files are separate is beyond me. <tt> 0.0 0.0 0.0 0.0 0.0 0.011 0.0 0.0 0.0 0.0 </tt> 'ProtData' files look like: <tt> % % Coding Data for GeneN % Product = N mRNADegradationRate = 0.2 sec </tt> The 'Product' is the name of the protein product. In this version of the simulator, the number of the proteins per transcript are decided by a race between ribosome binding and RNaseE binding. Ribosomes and RNAase are kept pretty constant during the simulation, the number of proteins per transcript is the ratio of the ribosome binding rate to the 'mRNADegradation rate'. For the simulations we ran, the ribosome binding rate was always roughly 2.0. This format has been changed in the latest verion but it is not consistent with the file structures above. An optional RibosomeBindingRate can be set in the ProtData file by adding this to the bottom of the file. If it is not specified, the default value is used. 'TermData' files look like: <tt> % % TL1 Terminator % TermModifier = N BaseFallOffRate = 25.0 sec BaseRNAPMotion = 5.0 sec AntiTerminatedFallOffRate = 0.0 sec AntiTerminatedRNAPMotion = 30.0 sec </tt> This is for terminators that may be susceptible to antiterminated RNAP. TermModifier is the appropriate antitermination molecule that must be bound the RNAP crossing these region in order for the antitermination kinetics to dominate. The protein is 'N' in this case. For PR' it would be 'Q'. The 'BaseFallOffRate' and 'BaseRNAPMotion' are the rates of termination and transcription respectively that result while a non-antiterminated RNAP crosses the termination region. The AntiTerminated variables are for when the RNAP has bound the TermModifier. In this version of the simulator-- these kinetics are only used on the last base of the terminator. All other bases are treated like normal DNA. This yields the correct experimental results but is probably not how it actually work. (Like any of this is how it actually works.... :-/ ) If the TermModifier parameter is not specified, then the terminator is assumed not to be anti-terminated. In this case, the AntiTerminatedFallOffRate and AntiTerminatedRNAPMotion parameters should be omitted. Finally, 'ATermData' files look like: <tt> % % AntiTermination Site NUTL % % % Modified: 10/29/96 to make 70nM % the effective concentration. % % TermModifier = N UnBoundRNAPMotion = 5 sec BindingRate = 0.145 sec BoundRNAPMotion = 30 sec UnBindingRate = 0.1 sec </tt> This is the same sort of deal as the TermData files--- modification molecule, how RNAP moves when the modifier is not present, the binding rate of the modifier to RNAP (BindingRate ends up multiplied by the modifier concentration), how modifier bound RNAP moves on the DNA and the rate of modifier loss. OK..... got all that? I know: Rube Golbergian. I promise the next version will be very simple to use. A simple example is included in Auto.tar. This is a simple regulatory loop similar to that used to produce the data in our firt PNAS paper. In that directory is a shell script called 'simpsim' that I was using to run many simulations in batch. Try it out. Adam Arkin