iljah/hdintegrator

N-sphere integrand fails

Closed this issue · 13 comments

$ mpiexec -n 2 ./hdintegrator.py --integrand integrands/N-sphere --dimensions 15 --prerefine 200
Traceback (most recent call last):
  File "./hdintegrator.py", line 356, in <module>
    split(choice(grid.get_cells()), 1, [randint(0, len(dimensions) - 1)], grid)
  File "/Users/mark/Desktop/review/ve/lib/python3.6/random.py", line 258, in choice
    return seq[i]
  File "/Users/mark/Desktop/review/ve/lib/python3.6/site-packages/networkx/classes/reportviews.py", line 178, in __getitem__
    return self._nodes[n]
KeyError: 0
^C(ve) netbook:hdintegrator mark$ 

It hangs after the error and I had to ctrl-c it.

iljah commented

Hmm your command worked for me, the result was:

3.3696876509523746e-06 1.9440074486843323e-07 0.0

Please try the command with --verbose and attach the last 100 lines or so.

(ve) netbook:hdintegrator mark$ mpiexec -n 2 ./hdintegrator.py --integrand integrands/N-sphere --dimensions 15 --prerefine 200 --verbose
Starting with 2 processes
Traceback (most recent call last):
  File "./hdintegrator.py", line 356, in <module>
    split(choice(grid.get_cells()), 1, [randint(0, len(dimensions) - 1)], grid)
  File "/Users/mark/Desktop/review/ve/lib/python3.6/random.py", line 258, in choice
    return seq[i]
  File "/Users/mark/Desktop/review/ve/lib/python3.6/site-packages/networkx/classes/reportviews.py", line 178, in __getitem__
    return self._nodes[n]
KeyError: 0
Integrand initialized by rank 1
Rank 1 waiting for work
iljah commented

Ok something strange is happening on process 0 and I can't tell if it's a problem with hdintegrator or networkx. This is what I get:

Starting with 2 processes
Integrand initialized by rank 1
Rank 1 waiting for work
Grid initialized by rank 0 with 201 cells
Number of work item slots: 1
201 work left, 0 processing
Sending cell 3292 for processing to rank 1

Try changing

				split(random_cell, 1, random_ints, grid)

around line 360 in hdintegrator.py to

				try:
					random_cell = choice(grid.get_cells())
					random_int = [randint(0, len(dimensions) - 1)]
					split(random_cell, 1, random_int, grid)
				except Exception as e:
					print("Couldn't split cell", random_cell, 'along', random_int, 'because', e)
					exit(1)

and let me know what that prints.

Hmmm ... I dont see that line. Are we on the same branch? Im on master.

iljah commented

This is the exact place online: https://github.com/iljah/hdintegrator/blob/master/hdintegrator.py#L356 and oops, I put the new version of the split line as the old one. The link above is the correct place to change.

See below. Note that the values used in the error message are uninitialised.

(ve) netbook:hdintegrator mark$ mpiexec -n 2 ./hdintegrator.py --integrand integrands/N-sphere --dimensions 15 --prerefine 200 --verbose
Starting with 2 processes
Traceback (most recent call last):
  File "./hdintegrator.py", line 358, in <module>
    random_cell = choice(grid.get_cells())
  File "/Users/mark/Desktop/review/ve/lib/python3.6/random.py", line 258, in choice
    return seq[i]
  File "/Users/mark/Desktop/review/ve/lib/python3.6/site-packages/networkx/classes/reportviews.py", line 178, in __getitem__
    return self._nodes[n]
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./hdintegrator.py", line 363, in <module>
    "Couldn't split cell", random_cell, 'along', random_int, 'because', e)
NameError: name 'random_cell' is not defined
Integrand initialized by rank 1
Rank 1 waiting for work
iljah commented

I meant that one line should be replaced with the entire try+except code above, so that the new file looks like this between lines 352..364:

				c.set_extent(i, args.min_extent, args.max_extent)
			grid = ndgrid(c)

			for i in range(args.prerefine):
				try:
					random_cell = choice(grid.get_cells())
					random_int = [randint(0, len(dimensions) - 1)]
					split(random_cell, 1, random_int, grid)
				except Exception as e:
					print("Couldn't split cell", random_cell, 'along', random_int, 'because', e)
					exit(1)
			if args.verbose:
				print('Grid initialized by rank', rank, 'with', len(grid.get_cells()), 'cells')

this way we should see if split is called with sane arguments and I should be able to reproduce the problem.

Yes, I did that. If you look careful, you will see that random_cell and random_int will be uninitialised if the choice line fails (which it does).

iljah commented

Oh right, so does grid.get_cells() return an empty list then in this case?

A NodeView object.

iljah commented

Looks like networkx api was changed from 1.x to 2.x. and I'm still using 1.x series. I've updated ndgrid and the corresponding submodule in hdintegrator, if you do a git pull followed by git submodule update it should work. If it doesn't and ndgrid in submodule dir looks like https://github.com/iljah/ndgrid/blob/master/source/ndgrid.py#L71 then something else might be the problem.

Ha! It works :D

iljah commented

Great!