ORNL-Fusion/ips-wrappers

AToM IPS location(s)

Closed this issue · 9 comments

dlg0 commented

@ORNL-Fusion/ips-support-team @ORNL-Fusion/ips-wrapper-developers

Within the atom-install-edison directory on Edison, I have renamed several legacy IPS installation directories. This may breaks things if you were pointing to the non-standard installation. We're endeavoring to have the following ...

The binaries for the IPS (Framework only - maintained by @elwasif - this is the install only, not the source)
/project/projectdirs/atom/atom-install-edison/ips-gnu-sf

The AToM IPS-Wrappers (contains fastran, EPED, Gyro, etc wrappers - maintained by @dlg0)
/project/projectdirs/atom/atom-install-edison/ips-wrappers

The traditional CSWIM project repo (Wrappers only - maintained by @elwasif)
/project/projectdirs/atom/atom-install-edison/ips-cswim-wrappers

So any directory with a leading underscore is scheduled for deletion. I think none of them are in use, but let me know if something breaks.

screen shot 2015-09-14 at 12 49 33 pm

@dlg0 there seem to be problem with the running of the EPED workflow, which makes the run fail.

...
RM: get_allocation() returned %s (False, ['5634', '5635', '5636', '5632', '5633'], 24, 24, False)
build_launch_cmd( 120 /global/project/projectdirs/atom/users/meneghin/ips-atom/ips-eped//bin/run_parallel.exe list_jobs_elite () 24 24  False False )
in wait task
collect .gamma file k=0, nmode=5: not completed: IOError(2, 'No such file or directory')
collect .time file k=0, nmode=5: not completed: IOError(2, 'No such file or directory')
collect .log file k=0, nmode=5: not completed: IOError(2, 'No such file or directory')

In addition to this new fatal problem I still have issues with IPS being stuck independently of whether the run is completed successfully or not until the walltime expires.

You can look under this area: /scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0/

Permissions?

please, try again.

Nope
cd projectID__9844359dab__2015-09-14_13_20_34/
-bash: cd: projectID__9844359dab__2015-09-14_13_20_34/: Permission denied

@parkjm This appears to be a problem with the elite component. The list_jobs_elite file has nothing in it, which may be due to input data issues. So nothing runs there and no output is generated.

I found an error in the ips-eped component that I had introduced. Now this is fixed and pushed the update to the ips-eped repository. I am able to run individual runs of IPS-EPED.

However the DAKOTA scans of IPS-EPED fail at the very beginning. The issue seems to have to do with a socket connection. Simulation files are under /scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0 and I made sure to grant access permissions. This is what I get:

cmd = /project/projectdirs/atom/atom-install-edison/ips-gnu-sf/bin/ips.py --all --simulation=/scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0/run01/dakota_bridge_37319.conf --platform=/scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0/MACHINE.conf --verbose --log=IPS.log
Sep 18 2015 07:40:29  Launched IPS
Sep 18 2015 07:40:29  0 ips_dakota_dynamic connecting to IPS dakota bridge <class 'socket.error'> [Errno 2] No such file or directory
Starting IPS
Created <class 'runspaceInitComponent.runspaceInitComponent'>
['5802', '5803', '5804', '5805', '5806', '5807', '5808', '5809', '5810', '5811', '5812', '5813', '5814', '5815', '5816', '5817', '5818', '5819', '5820', '5821', '5822', '5823', '5824', '5825', '5826', '5827', '5832', '5833', '5834', '5835', '5836', '5837', '5838', '5839', '5840', '5841', '5842', '5843', '5844', '5845', '5846', '5847', '5848', '5849', '5850', '5851', '5852', '5853', '5854', '5855', '5856', '5857', '5858', '5859', '5860', '5861', '5862', '5863', '5864', '5865', '5866', '5867', '5868', '5869', '5870', '5871', '5872', '5873', '5874', '5875', '5876', '5877', '5878', '5879', '5880', '5881', '5882', '5883', '5884', '5885', '5886', '5887', '5888', '5889', '5890', '5891', '5892', '5893', '5894', '5895', '5896', '5897', '5898', '5899', '5900', '5901', '5902', '5903', '5904', '5905', '5906', '5907', '5908', '5909', '5910', '5911', '5912', '5913', '5914', '5915', '5916', '5917', '5918', '5919', '5920', '5921', '5922', '5923', '5924', '5925', '5926', '5927', '5928', '5929', '5930', '5931', '5932', '5933', '5934', '5935', '5936', '5937', '5938', '5939', '5940', '5941', '5942', '5943', '5944', '5945', '5946', '5947', '5948', '5949', '5950', '5951', '5956', '5957', '5958', '5959'] 24 False [('5802', '24'), ('5803', '24'), ('5804', '24'), ('5805', '24'), ('5806', '24'), ('5807', '24'), ('5808', '24'), ('5809', '24'), ('5810', '24'), ('5811', '24'), ('5812', '24'), ('5813', '24'), ('5814', '24'), ('5815', '24'), ('5816', '24'), ('5817', '24'), ('5818', '24'), ('5819', '24'), ('5820', '24'), ('5821', '24'), ('5822', '24'), ('5823', '24'), ('5824', '24'), ('5825', '24'), ('5826', '24'), ('5827', '24'), ('5832', '24'), ('5833', '24'), ('5834', '24'), ('5835', '24'), ('5836', '24'), ('5837', '24'), ('5838', '24'), ('5839', '24'), ('5840', '24'), ('5841', '24'), ('5842', '24'), ('5843', '24'), ('5844', '24'), ('5845', '24'), ('5846', '24'), ('5847', '24'), ('5848', '24'), ('5849', '24'), ('5850', '24'), ('5851', '24'), ('5852', '24'), ('5853', '24'), ('5854', '24'), ('5855', '24'), ('5856', '24'), ('5857', '24'), ('5858', '24'), ('5859', '24'), ('5860', '24'), ('5861', '24'), ('5862', '24'), ('5863', '24'), ('5864', '24'), ('5865', '24'), ('5866', '24'), ('5867', '24'), ('5868', '24'), ('5869', '24'), ('5870', '24'), ('5871', '24'), ('5872', '24'), ('5873', '24'), ('5874', '24'), ('5875', '24'), ('5876', '24'), ('5877', '24'), ('5878', '24'), ('5879', '24'), ('5880', '24'), ('5881', '24'), ('5882', '24'), ('5883', '24'), ('5884', '24'), ('5885', '24'), ('5886', '24'), ('5887', '24'), ('5888', '24'), ('5889', '24'), ('5890', '24'), ('5891', '24'), ('5892', '24'), ('5893', '24'), ('5894', '24'), ('5895', '24'), ('5896', '24'), ('5897', '24'), ('5898', '24'), ('5899', '24'), ('5900', '24'), ('5901', '24'), ('5902', '24'), ('5903', '24'), ('5904', '24'), ('5905', '24'), ('5906', '24'), ('5907', '24'), ('5908', '24'), ('5909', '24'), ('5910', '24'), ('5911', '24'), ('5912', '24'), ('5913', '24'), ('5914', '24'), ('5915', '24'), ('5916', '24'), ('5917', '24'), ('5918', '24'), ('5919', '24'), ('5920', '24'), ('5921', '24'), ('5922', '24'), ('5923', '24'), ('5924', '24'), ('5925', '24'), ('5926', '24'), ('5927', '24'), ('5928', '24'), ('5929', '24'), ('5930', '24'), ('5931', '24'), ('5932', '24'), ('5933', '24'), ('5934', '24'), ('5935', '24'), ('5936', '24'), ('5937', '24'), ('5938', '24'), ('5939', '24'), ('5940', '24'), ('5941', '24'), ('5942', '24'), ('5943', '24'), ('5944', '24'), ('5945', '24'), ('5946', '24'), ('5947', '24'), ('5948', '24'), ('5949', '24'), ('5950', '24'), ('5951', '24'), ('5956', '24'), ('5957', '24'), ('5958', '24'), ('5959', '24')]
=======================================================
['5802', '5803', '5804', '5805', '5806', '5807', '5808', '5809', '5810', '5811', '5812', '5813', '5814', '5815', '5816', '5817', '5818', '5819', '5820', '5821', '5822', '5823', '5824', '5825', '5826', '5827', '5832', '5833', '5834', '5835', '5836', '5837', '5838', '5839', '5840', '5841', '5842', '5843', '5844', '5845', '5846', '5847', '5848', '5849', '5850', '5851', '5852', '5853', '5854', '5855', '5856', '5857', '5858', '5859', '5860', '5861', '5862', '5863', '5864', '5865', '5866', '5867', '5868', '5869', '5870', '5871', '5872', '5873', '5874', '5875', '5876', '5877', '5878', '5879', '5880', '5881', '5882', '5883', '5884', '5885', '5886', '5887', '5888', '5889', '5890', '5891', '5892', '5893', '5894', '5895', '5896', '5897', '5898', '5899', '5900', '5901', '5902', '5903', '5904', '5905', '5906', '5907', '5908', '5909', '5910', '5911', '5912', '5913', '5914', '5915', '5916', '5917', '5918', '5919', '5920', '5921', '5922', '5923', '5924', '5925', '5926', '5927', '5928', '5929', '5930', '5931', '5932', '5933', '5934', '5935', '5936', '5937', '5938', '5939', '5940', '5941', '5942', '5943', '5944', '5945', '5946', '5947', '5948', '5949', '5950', '5951', '5956', '5957', '5958', '5959'] 24 False [('5802', '24'), ('5803', '24'), ('5804', '24'), ('5805', '24'), ('5806', '24'), ('5807', '24'), ('5808', '24'), ('5809', '24'), ('5810', '24'), ('5811', '24'), ('5812', '24'), ('5813', '24'), ('5814', '24'), ('5815', '24'), ('5816', '24'), ('5817', '24'), ('5818', '24'), ('5819', '24'), ('5820', '24'), ('5821', '24'), ('5822', '24'), ('5823', '24'), ('5824', '24'), ('5825', '24'), ('5826', '24'), ('5827', '24'), ('5832', '24'), ('5833', '24'), ('5834', '24'), ('5835', '24'), ('5836', '24'), ('5837', '24'), ('5838', '24'), ('5839', '24'), ('5840', '24'), ('5841', '24'), ('5842', '24'), ('5843', '24'), ('5844', '24'), ('5845', '24'), ('5846', '24'), ('5847', '24'), ('5848', '24'), ('5849', '24'), ('5850', '24'), ('5851', '24'), ('5852', '24'), ('5853', '24'), ('5854', '24'), ('5855', '24'), ('5856', '24'), ('5857', '24'), ('5858', '24'), ('5859', '24'), ('5860', '24'), ('5861', '24'), ('5862', '24'), ('5863', '24'), ('5864', '24'), ('5865', '24'), ('5866', '24'), ('5867', '24'), ('5868', '24'), ('5869', '24'), ('5870', '24'), ('5871', '24'), ('5872', '24'), ('5873', '24'), ('5874', '24'), ('5875', '24'), ('5876', '24'), ('5877', '24'), ('5878', '24'), ('5879', '24'), ('5880', '24'), ('5881', '24'), ('5882', '24'), ('5883', '24'), ('5884', '24'), ('5885', '24'), ('5886', '24'), ('5887', '24'), ('5888', '24'), ('5889', '24'), ('5890', '24'), ('5891', '24'), ('5892', '24'), ('5893', '24'), ('5894', '24'), ('5895', '24'), ('5896', '24'), ('5897', '24'), ('5898', '24'), ('5899', '24'), ('5900', '24'), ('5901', '24'), ('5902', '24'), ('5903', '24'), ('5904', '24'), ('5905', '24'), ('5906', '24'), ('5907', '24'), ('5908', '24'), ('5909', '24'), ('5910', '24'), ('5911', '24'), ('5912', '24'), ('5913', '24'), ('5914', '24'), ('5915', '24'), ('5916', '24'), ('5917', '24'), ('5918', '24'), ('5919', '24'), ('5920', '24'), ('5921', '24'), ('5922', '24'), ('5923', '24'), ('5924', '24'), ('5925', '24'), ('5926', '24'), ('5927', '24'), ('5928', '24'), ('5929', '24'), ('5930', '24'), ('5931', '24'), ('5932', '24'), ('5933', '24'), ('5934', '24'), ('5935', '24'), ('5936', '24'), ('5937', '24'), ('5938', '24'), ('5939', '24'), ('5940', '24'), ('5941', '24'), ('5942', '24'), ('5943', '24'), ('5944', '24'), ('5945', '24'), ('5946', '24'), ('5947', '24'), ('5948', '24'), ('5949', '24'), ('5950', '24'), ('5951', '24'), ('5956', '24'), ('5957', '24'), ('5958', '24'), ('5959', '24')]
[('5802', '24'), ('5803', '24'), ('5804', '24'), ('5805', '24'), ('5806', '24'), ('5807', '24'), ('5808', '24'), ('5809', '24'), ('5810', '24'), ('5811', '24'), ('5812', '24'), ('5813', '24'), ('5814', '24'), ('5815', '24'), ('5816', '24'), ('5817', '24'), ('5818', '24'), ('5819', '24'), ('5820', '24'), ('5821', '24'), ('5822', '24'), ('5823', '24'), ('5824', '24'), ('5825', '24'), ('5826', '24'), ('5827', '24'), ('5832', '24'), ('5833', '24'), ('5834', '24'), ('5835', '24'), ('5836', '24'), ('5837', '24'), ('5838', '24'), ('5839', '24'), ('5840', '24'), ('5841', '24'), ('5842', '24'), ('5843', '24'), ('5844', '24'), ('5845', '24'), ('5846', '24'), ('5847', '24'), ('5848', '24'), ('5849', '24'), ('5850', '24'), ('5851', '24'), ('5852', '24'), ('5853', '24'), ('5854', '24'), ('5855', '24'), ('5856', '24'), ('5857', '24'), ('5858', '24'), ('5859', '24'), ('5860', '24'), ('5861', '24'), ('5862', '24'), ('5863', '24'), ('5864', '24'), ('5865', '24'), ('5866', '24'), ('5867', '24'), ('5868', '24'), ('5869', '24'), ('5870', '24'), ('5871', '24'), ('5872', '24'), ('5873', '24'), ('5874', '24'), ('5875', '24'), ('5876', '24'), ('5877', '24'), ('5878', '24'), ('5879', '24'), ('5880', '24'), ('5881', '24'), ('5882', '24'), ('5883', '24'), ('5884', '24'), ('5885', '24'), ('5886', '24'), ('5887', '24'), ('5888', '24'), ('5889', '24'), ('5890', '24'), ('5891', '24'), ('5892', '24'), ('5893', '24'), ('5894', '24'), ('5895', '24'), ('5896', '24'), ('5897', '24'), ('5898', '24'), ('5899', '24'), ('5900', '24'), ('5901', '24'), ('5902', '24'), ('5903', '24'), ('5904', '24'), ('5905', '24'), ('5906', '24'), ('5907', '24'), ('5908', '24'), ('5909', '24'), ('5910', '24'), ('5911', '24'), ('5912', '24'), ('5913', '24'), ('5914', '24'), ('5915', '24'), ('5916', '24'), ('5917', '24'), ('5918', '24'), ('5919', '24'), ('5920', '24'), ('5921', '24'), ('5922', '24'), ('5923', '24'), ('5924', '24'), ('5925', '24'), ('5926', '24'), ('5927', '24'), ('5928', '24'), ('5929', '24'), ('5930', '24'), ('5931', '24'), ('5932', '24'), ('5933', '24'), ('5934', '24'), ('5935', '24'), ('5936', '24'), ('5937', '24'), ('5938', '24'), ('5939', '24'), ('5940', '24'), ('5941', '24'), ('5942', '24'), ('5943', '24'), ('5944', '24'), ('5945', '24'), ('5946', '24'), ('5947', '24'), ('5948', '24'), ('5949', '24'), ('5950', '24'), ('5951', '24'), ('5956', '24'), ('5957', '24'), ('5958', '24'), ('5959', '24')] 24 2 24 False
Checklist config file "/scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0/run01/checklist.conf" could not be found, continuing without.
runspaceInitComponent.init() called
CREATE_RUNSPACE = DONE
RUN_SETUP = DONE
RUN = DONE
runspaceInitComponent.step() called
runspaceInitComponent.finalize() called
Sep 18 2015 07:40:33  About to Create Listener /scratch1/scratchdirs/meneghin/ips_dynamic_37319.tmp
Sep 18 2015 07:40:33  Created Listener /scratch1/scratchdirs/meneghin/ips_dynamic_37319.tmp <multiprocessing.connection.Listener object at 0x2b9edfbcf590>
Sep 18 2015 07:40:34  ips_dakota_dynamic received response from IPS  {'SIMSTATUS': 'ACK'}
Sep 18 2015 07:40:34  Launched DAKOTA
Created <class 'eped_init.eped_init'>
Sep 18 2015 07:40:39  0 ips_dakota_dynamic connecting to IPS dakota bridge <class 'socket.error'> [Errno 2] No such file or directory

 + --------------------------------------------------------------------------
 +        Job name: IPS
 +          Job Id: 3613464.edique02
 +          System: edison
 +     Queued Time: Fri Sep 18 07:39:51 2015
 +      Start Time: Fri Sep 18 07:40:24 2015
 + Completion Time: Fri Sep 18 07:40:47 2015
 +            User: meneghin
 +        MOM Host: nid02434
 +           Queue: debug
 +  Req. Resources: mppnodect=150,mppnppn=24,mppwidth=3600,walltime=00:01:00
 +  Used Resources: cput=00:00:00,energy_used=0,mem=314096kb,vmem=2741268kb,walltime=00:00:20
 +     Acct String: atom
 +   PBS_O_WORKDIR: /scratch1/scratchdirs/meneghin/OMFIT/runs/projectID__9844359dab__2015-09-14_13_20_34/EPED-sim1__IPScore-sim1/p0
 +     Submit Args: qsub_IPS
 + --------------------------------------------------------------------------

Fri Sep 18 07:40:54 PDT 2015
3613464.edique02 IPS meneghin 00:00:00 C debug
-== STD_ERR ==-
Traceback (most recent call last):
  File "/project/projectdirs/atom/atom-install-edison/ips-gnu-sf/bin/ips_dakota_dynamic.py", line 363, in <module>
    sys.exit(main())
  File "/project/projectdirs/atom/atom-install-edison/ips-gnu-sf/bin/ips_dakota_dynamic.py", line 354, in main
    sweep.run()
  File "/project/projectdirs/atom/atom-install-edison/ips-gnu-sf/bin/ips_dakota_dynamic.py", line 285, in run
    conn = Client(sock_address, 'AF_UNIX')
  File "/usr/common/usg/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client
    c = SocketClient(address)
  File "/usr/common/usg/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
    s.connect(address)
  File "/usr/common/usg/python/2.7.9/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 2] No such file or directory

in the dakota_37319.log file I see:

Sep 18 2015 07:40:45: 4 Failed to connect to /scratch1/scratchdirs/meneghin/ips_dynamic_37319.tmp

which indeed is not there.

Any help by the @elwasif and the @ORNL-Fusion/ips-wrapper-developers team would be greatly appreciated.

The error, which actually causes the framework to exit and the DAKOTA connection to fail is in IPS.log
2015-09-18 07:40:39,652 FRAMEWORK ERROR Error in configuration file : NAME = e
ped_driver SCRIPT = /global/project/projectdirs/atom/users/meneghin/ips-atom/ips-eped
//src/eped_driver.py
2015-09-18 07:40:39,652 FRAMEWORK ERROR Error instantiating IPS component eped
_driver From eped_driver
Traceback (most recent call last):
File "/global/project/projectdirs/atom/atom-install-edison/ips-gnu-sf/bin/configurati
onManager.py", line 695, in _create_component
module = imp.load_module(script, modFile, pathname, description)
File "/global/project/projectdirs/atom/users/meneghin/ips-atom/ips-eped//src/eped_driver.py", line 18, in
from harvest_client.harvest_lib import *
ImportError: No module named harvest_client.harvest_lib

Thank you for pointing me there. It's unclear to me why the IPS-EPED works and the DAKOTA scan doesn't. I'll look into it.

Found the error. I was running off of an ips-eped-dev directory for which I did not do

git submodule init
git submodule update

Works perfectly now. Also, I appreciate that the debug prints got removed. Thank you!!