nathanhaigh/2019_EMBL-ABR_Snakemake_webinar

Python error in dynamic resource allocation

Opened this issue · 0 comments

The lambda function shown to provide a dynamic allocation of resources (time/mem) using the input file sizes may not work as documented.

The corresponding bit of the code is things line for f in input['index'] in the lambda function specified in the resource keyword sections. This is because Python expects this to be a list/array of 1 or more values. However, if the index file specified in the input of the rule is a single quoted string this will fail.

For example, this will fail:

rule myrule :
	input: index = "some_index_file",
	output:    ...
	resources:
		mem_mb = lambda wildcards, input, attempt: math.ceil( sum(os.path.getsize(f) for f in input['index'] if os.path.isfile(f)) / 1024**2*(1+(attempt-1)/10)),
	shell:     ...

Either of the following two solutions will work:

  1. Making the index in the input an array
rule myrule :
	input: index = [ "some_index_file" ],
	output:    ...
	resources:
		mem_mb = lambda wildcards, input, attempt: math.ceil( sum(os.path.getsize(f) for f in input['index'] if os.path.isfile(f)) / 1024**2*(1+(attempt-1)/10)),
	shell:     ...
  1. Coerce the object in the lambda function into an array
rule myrule :
	input: index = "some_index_file",
	output:    ...
	resources:
		mem_mb = lambda wildcards, input, attempt: math.ceil( sum(os.path.getsize(f) for f in [input['index']] if os.path.isfile(f)) / 1024**2*(1+(attempt-1)/10)),
	shell:     ...