/ascar-parallel

Examples of parallel code using ipython

Primary LanguagePython

parallel-examples

Examples of parallel code using ipython on ascar

setting up ascar (one time stuff)

  • Edit your ~/.bashrc or ~/.bash_profile to include the correct version of python

    We have two versions of python installed on ascar. Enthought Canopy and Continuum's Anaconda. Currently Enthought Canopy has a working version of the GIS python modules, Anaconda does not.

    ONLY USE ONE OF THE BELOW

    To activate Enthought Canopy use the following source /share/apps/enthought/User/bin/activate To activate Continuum Anaconda use the following export PATH="/share/apps/anaconda-1.6/bin:$PATH

    Currently I have only tested Canopy with the parallel code

  • setup your basic parallel ipython environment

    • $ ipython profile create sge --parallel

    • Edit ~/.config/ipython/profile_sge/ipcontroller_config.py, adding the line:

      <code> c.HubFactory.ip = '*' </code>
      
      to instruct the controller to listen on all interfaces.
      
    • Edit ~/.config/ipython/profile_sge/ipcluster_config.py, adding the lines:

      <code> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'</code>
      
      <code> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'</code>
      
      <code> c.SGELauncher.queue = 'all.q'</code>
      

      at this point you should be able to start a cluster using

      $ ipcluster start -n 10 --profile=sge --cluster-id=test

      check that is started using

      $ qstat

      and stop it using

      $ ipcluster stop --profile=sge --cluster-id=test

installing ascar-parallel

This package ascar-parallel does some of the boiler-plate of starting and stopping the cluster for you.

It is already installed on the canopy python but for reference, to install:

$ git clone git@github.com:twdb/ascar-parallel.git

$ cd ascar-parallel

$ python setup.py install

using ascar parallel

in your code use the following:

from ascar_parallel import StartCluster

with StartCluster(8) as lview:
	lview.map(myfunc, <args>)

more details in the examples folder

common issues

  • imports The simplest way to make imported packages available to all the compute engines you can import them inside the function you are parallelizing, i.e.

     from ascar_parallel import StartCluster
    
     def myfunc(a,b):
     	import numpy as np
     	return np.sqrt(a+b)
    
     a_list = range(5)
     b_list = range(5)
     with StartCluster(8) as lview:
     	lview.map(myfunc, a_list, b_list)

    if you are defining a bunch of helper functions for myfunc, then these will not be available on the compute engines. You can do one of two things.

    • define the functions inside myfunc:
     def myfunc(a,b)
     	def helperfunc(c,d):
     		stuff
    
     	<statements>
    • put the helper functions inside another file/module and then import them. This is the prefered way
     def myfunc(a,b)
     	from mymodule import helperfunc
     	<statements>
  • files: ascar has four main shared folders:

    • /home - user home directories, this is backed up. DO NOT SAVE MODEL RESULTS HERE
    • /share/swr - this is backed up. About 500GB of space currently available
    • /share/work - this is not backed up. About 7TB of space currently available
    • /share/apps - this is a good place to put software and executable code (i.e. fortran etc). You can then add this to your path so that executables are available from all the compute engines.