open-switch/opx-pas

Running opx-show-env command fails

Closed this issue · 19 comments

I observed this in two different switches

root@OPX:~# opx-show-env
Traceback (most recent call last):
  File "/usr/bin/opx-show-env", line 180, in <module>
    chassis_print()
  File "/usr/bin/opx-show-env", line 59, in chassis_print
    d = cps_chassis_data[0]['data']
IndexError: list index out of range
root@OPX:~# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: active (running) since Mon 2017-09-18 20:43:05 UTC; 2 days ago
 Main PID: 30044 (opx_pas_service)
   CGroup: /system.slice/opx-pas.service
           └─30044 /usr/bin/opx_pas_service

Sep 20 22:20:34 OPX opx_pas_service[30044]: [PAS:dn_remote_temp_sensor_poll], CPS API get failed
Sep 20 22:20:34 OPX opx_pas_service[30044]: [PAS:dn_pas_remote_poller_thread], Poll cycle failed
Sep 20 22:20:39 OPX opx_pas_service[30044]: [DSAPI:NS], No connection to NS for 1.36.2359341.2359299.2359302.
Sep 20 22:20:39 OPX opx_pas_service[30044]: [DSAPI:NS], Failed to find owner for 1.36.2359341.2359299.2359302.
Sep 20 22:20:39 OPX opx_pas_service[30044]: [PAS:dn_remote_temp_sensor_poll], CPS API get failed
Sep 20 22:20:39 OPX opx_pas_service[30044]: [PAS:dn_pas_remote_poller_thread], Poll cycle failed
Sep 20 22:20:44 OPX opx_pas_service[30044]: [DSAPI:NS], No connection to NS for 1.36.2359341.2359299.2359302.
Sep 20 22:20:44 OPX opx_pas_service[30044]: [DSAPI:NS], Failed to find owner for 1.36.2359341.2359299.2359302.
Sep 20 22:20:44 OPX opx_pas_service[30044]: [PAS:dn_remote_temp_sensor_poll], CPS API get failed
Sep 20 22:20:44 OPX opx_pas_service[30044]: [PAS:dn_pas_remote_poller_thread], Poll cycle failed
root@OPX:~# opx-show-version 
OS_NAME="OPX"
OS_VERSION="2.1.0"
PLATFORM="S6000-ON"
ARCHITECTURE="x86_64"
INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0"
BUILD_VERSION="2.1.0(0)"
BUILD_DATE="2017-08-04T12:01:19-0700"
INSTALL_DATE="2017-09-17T21:47:42+0000"
SYSTEM_UPTIME= 3 days, 31 minutes
SYSTEM_STATE= degraded
UPGRADED_PACKAGES=no
ALTERED_PACKAGES=no
root@OPX:~# cat /etc/OPX-release-version 
OS_NAME="OPX"
OS_VERSION="2.1.0"
PLATFORM="S6000-ON"
ARCHITECTURE="x86_64"
INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0"
BUILD_VERSION="2.1.0(0)"
BUILD_DATE="2017-08-04T12:01:19-0700"
INSTALL_DATE="2017-09-17T21:47:42+0000"
root@OPX:~# 

On a different switch

root@l1:~# opx-show-env 
Chassis
	Operating status:		Fail
	Fault type:			Configuration error
	Vendor name:		
	Service tag:		5ZMNX42
	PPID:				CN099TJK2829855L0003
	Platform name:			x86_64-dell_s4000_c2338-r0
	Product name:			S4048ON
	Hardware version:		A00
	Number of MAC addresses:	256
	Base MAC address:		34:17:eb:f6:62:c4
Power supplies
	Slot 1
		Present:		Yes
		Operating status:	Up
		Fault type:		OK
		Vendor name:		Traceback (most recent call last):
  File "/usr/bin/opx-show-env", line 181, in <module>
    all_psus_print()
  File "/usr/bin/opx-show-env", line 120, in all_psus_print
    psu_print(entity_data, psu_data)
  File "/usr/bin/opx-show-env", line 99, in psu_print
    if not entity_print(entity_data):
  File "/usr/bin/opx-show-env", line 90, in entity_print
    print '\t\tVendor name:\t\t', cps_attr_get(entity_data, 'base-pas/entity/vendor-name')
  File "/usr/bin/opx-show-env", line 22, in cps_attr_get
    return cps_utils.cps_attr_types_map.from_data(attr, cps_data[attr])
KeyError: 'base-pas/entity/vendor-name'
root@l1:~# 
root@l1:~# 
root@l1:~# 
root@l1:~# opx-show-version 
OS_NAME="OPX"
OS_VERSION="2.1.0"
PLATFORM=""
ARCHITECTURE="x86_64"
INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0"
BUILD_VERSION="2.1.0(0)"
BUILD_DATE="2017-08-04T12:01:19-0700"
INSTALL_DATE="2017-08-25T20:28:04+0000"
SYSTEM_UPTIME= 1 week, 1 day, 17 hours, 33 minutes
SYSTEM_STATE= running
UPGRADED_PACKAGES=no
ALTERED_PACKAGES=no
root@l1:~# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: active (running) since Tue 2017-09-12 05:24:06 UTC; 1 weeks 1 days ago
 Main PID: 1345 (opx_pas_service)
   CGroup: /system.slice/opx-pas.service

Sep 20 22:39:05 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:41:09 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:43:56 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 10 size 2 data 0x7fbeeb409b50 is succeeded after 1 retries
Sep 20 22:46:22 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:46:32 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:46:46 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:52:18 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 17 size 2 data 0x7fbeeb409b20 is succeeded after 1 retries
Sep 20 22:54:04 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 10 size 2 data 0x7fbeeb409b10 is succeeded after 1 retries
Sep 20 22:54:12 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 10 size 2 data 0x7fbeeb409a30 is succeeded after 1 retries
Sep 20 22:56:06 l1 opx_pas_service[1345]: [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 smbus transaction on i2cdev_fd 6, operation 0 command 10 size 2 data 0x7fbeeb409a30 is succeeded after 1 retries
root@l1:~# cat /etc/OPX-release-version 
OS_NAME="OPX"
OS_VERSION="2.1.0"
PLATFORM=""
ARCHITECTURE="x86_64"
INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0"
BUILD_VERSION="2.1.0(0)"
BUILD_DATE="2017-08-04T12:01:19-0700"
INSTALL_DATE="2017-08-25T20:28:04+0000"

Found this on a new OPX VM today. Giving details here hoping that could provide additional info. What I noticed is cps.service was in failed state and after manually starting it opx-show-env started working..

root@opx_vm:/usr/sbin# opx-show-env
Traceback (most recent call last):
  File "/usr/bin/opx-show-env", line 180, in <module>
    chassis_print()
  File "/usr/bin/opx-show-env", line 59, in chassis_print
    d = cps_chassis_data[0]['data']
IndexError: list index out of range
root@opx_vm:/usr/sbin# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: failed (Result: signal) since Thu 2017-09-28 18:36:48 UTC; 36min ago
 Main PID: 339 (code=killed, signal=SEGV)

Sep 19 05:51:28 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM service tag not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM vendor name not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM product name not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM hardware revision not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM platform name not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM PPID not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM part number not programmed
Sep 19 05:51:30 opx_vm opx_pas_service[339]: [PAS:chassis_resp_set], Chassis EEPROM service tag not programmed
Sep 28 18:36:48 opx_vm systemd[1]: opx-pas.service: main process exited, code=killed, status=11/SEGV
Sep 28 18:36:48 opx_vm systemd[1]: Unit opx-pas.service entered failed state.
root@opx_vm:/usr/sbin# systemctl start opx-pas.service
root@opx_vm:/usr/sbin# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: active (running) since Thu 2017-09-28 19:14:26 UTC; 2s ago
 Main PID: 5632 (opx_pas_service)
   CGroup: /system.slice/opx-pas.service
           └─5632 /usr/bin/opx_pas_service

Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_entity_poll], PSU 2 is present
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_type_get], Non-qualified media adapater
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_oir_poll], Optic inserted in front panel port (18), qualified: Yes.
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_type_get], Non-qualified media adapater
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_oir_poll], Optic inserted in front panel port (19), qualified: Yes.
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_type_get], Non-qualified media adapater
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_oir_poll], Optic inserted in front panel port (20), qualified: Yes.
Sep 28 19:14:28 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_type_get], Non-qualified media adapater
Sep 28 19:14:29 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_oir_poll], Optic inserted in front panel port (21), qualified: Yes.
Sep 28 19:14:29 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_type_get], Non-qualified media adapater
root@opx_vm:/usr/sbin# opx-show-env
Chassis
	Operating status:		Fail
	Fault type:			Configuration error
	Vendor name:		
	Service tag:		
	PPID:				
	Platform name:			
	Product name:			
	Hardware version:		
	Number of MAC addresses:	256
	Base MAC address:		52:54:00:23:1e:65
Power supplies
	Slot 1
...

When you have this problem with opx-show-env, do you see the following message repeated in dmesg output:

ismt_smbus 0000:00:13.1: completion wait timed out

?

Thanks.

Thanks @Ragsboss, @hpersh is looking into it.

Seems like there are two failures:
(1) Immediate failure in opx-show-env when getting Chassis info (S6000, VM). Resolved after bringing CPS back up. (Why CPS went down is unknown though)
(2) Failure in the middle of opx-show-env when getting base-pas/entity/vendor-name (S4048).

@hpersh I don't see it..

root@opx_vm:/tmp# opx-show-env
Traceback (most recent call last):
  File "/usr/bin/opx-show-env", line 180, in <module>
    chassis_print()
  File "/usr/bin/opx-show-env", line 59, in chassis_print
    d = cps_chassis_data[0]['data']
IndexError: list index out of range
root@opx_vm:/tmp# dmesg | grep -i wait
root@opx_vm:/tmp# 

@jeff-yin I observe cps keeps getting into failed state - this certainly is a blocking issue for us.

Adding some diagnostic info for reference

syslog contents when pas service crashes

Sep 29 16:32:48 opx_vm systemd[1]: Reloading.
Sep 29 16:32:48 opx_vm systemd[1]: Reloading.
Sep 29 16:32:57 opx_vm kernel: [902554.110739] pas_fuse_handle[5637]: segfault at 0 ip 00007f8ee2a8921a sp 00007f8edea3c058 error 4 in libc-2.19.so[7f8ee2a09000+1a1000]
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/ready 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/pld 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/media-config 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/media-config 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/media-channel 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/media 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/media 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/media 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/temp_threshold 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/temp_threshold 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/temperature 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/temperature 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/temperature 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/display 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/display 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/led 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/fan 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/fan 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/fan 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/card 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/card 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/card 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/fan-tray 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/fan-tray 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/psu 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/psu 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/psu 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/entity 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/entity 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/temp_threshold 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/entity 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/chassis 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/chassis 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for realtime base-pas/media-channel 
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for observed base-pas/chassis 
Sep 29 16:32:57 opx_vm systemd[1]: opx-pas.service: main process exited, code=killed, status=11/SEGV
Sep 29 16:32:57 opx_vm opx_cps_service[336]: [DSAPI:NS], Added registration removed for target base-pas/media-channel 
Sep 29 16:32:57 opx_vm systemd[1]: Unit opx-pas.service entered failed state.

Status

root@opx_vm:/tmp# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: failed (Result: signal) since Fri 2017-09-29 16:32:57 UTC; 2min 35s ago
 Main PID: 5632 (code=killed, signal=SEGV)

Sep 28 19:14:29 opx_vm opx_pas_service[5632]: [PAS:dn_pas_media_oir_poll], Optic inserted in front panel port (32), qualified: Yes.
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM vendor name not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM product name not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM hardware revision not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM platform name not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM PPID not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM part number not programmed
Sep 28 19:14:32 opx_vm opx_pas_service[5632]: [PAS:chassis_resp_set], Chassis EEPROM service tag not programmed
Sep 29 16:32:57 opx_vm systemd[1]: opx-pas.service: main process exited, code=killed, status=11/SEGV
Sep 29 16:32:57 opx_vm systemd[1]: Unit opx-pas.service entered failed state.

There appear to be a few distinct things happening here:

(1) The "opx-show-env" command for the S4048 above failed because PAS failed to read the EEPROM for PSU 1. PAS is experiencing persistent I2C failures, as shown in the log with lines saying "... [BOARD:sdi_sys_smbus_execute], sdi_sys_smbus_execute:270 ...", which could cause this failure to read the EEPROM.
"opx-show-env" has been fixed, to be insensitive to hardware failures, and simply skip them, instead of crashing with a Python backtrace.
However, the real issue is the I2C failure. I would suggest power-cycling the switch, to clear any hardware failure.
I will see if I can reproduce this on an S4048.

(2) I do not see why the "opx-show-env" command is failing on the S6000. It is a known issue in 2.1.0 that PAS takes about 90 seconds to start up after boot. If you type the "opx-show-env" command too quickly after booting, PAS will not be running, and the command will fail as shown. According to the log, PAS seems to be running fine.
This 90-second issue has been fixed in release 2.1.1.
Also, the same fix for "opx-show-env" being insensitive to hardware failures would prevent the ugly Python backtrace here, as well.
I would suggest trying this again, but wait until PAS is running before typing the "opx-show-env" command.

(3) On the VM, PAS is aborting, with a segmentation violation. After PAS is manually restarted, it looks fine. I do not see why this should be, I will attempt to reproduce.

(3) above, cont'd
I have done a fresh install on a VM, and cannot reproduce this issue.

Would is be possible to access your VM?

@hpersh unfortunately VM is in our local network and I have no way to share it. I have this currently happening on one VM - I can share with you any diagnostic info you may need. Let me know.

BTW when is OPX 2.1.1 going to be available?

Can you send me the resulting corefile? Is should be located here: /core.

Sorry for the delay. The previous VM was gone. I waited for it to repro on a new one.. See attached.
opx-pas-18-core.zip.txt

@Ragsboss Are you doing operations on the PAS FUSE fs? The backtrace from the corefile you provided is showing a crash in the FUSE thread...

Not to my knowledge

I now moved onto OPX 2.2 and in OPX VM I'm seeing a similar issue now.. The only difference is opx-show-env returns nothing in OPX 2.2 whereas in OPX 2.1, it was printing stack trace..

root@leaf-1-1:/var/log/aos# opx-show-env
root@leaf-1-1:/var/log/aos# systemctl status opx-cps.service
● opx-cps.service - The CPS service is responsible for supporting communcation between applications
   Loaded: loaded (/lib/systemd/system/opx-cps.service; static)
   Active: active (running) since Wed 2018-02-14 20:49:28 UTC; 24h ago
 Main PID: 2176 (opx_cps_service)
   CGroup: /system.slice/opx-cps.service

Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for target base-pas/entity 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for target base-pas/psu 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for observed base-pas/fan-tray 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for observed base-pas/pld 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for realtime base-pas/ready 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for target base-pas/media-config 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for target base-pas/pld 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for realtime base-pas/pld 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for realtime base-pas/display 
Feb 15 21:31:08 leaf-1-1 opx_cps_service[2176]: [DSAPI:NS], Added registration removed for target base-pas/media 
root@leaf-1-1:/var/log/aos# systemctl status opx-pas.service
● opx-pas.service - This PAS service is to initialize platform.
   Loaded: loaded (/lib/systemd/system/opx-pas.service; enabled)
   Active: failed (Result: signal) since Thu 2018-02-15 21:31:08 UTC; 10min ago
 Main PID: 2183 (code=killed, signal=SEGV)

Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM vendor name not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM product name not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM hardware revision not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM platform name not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM PPID not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM part number not programmed
Feb 15 21:06:59 leaf-1-1 opx_pas_service[2183]: [PAS:chassis_resp_set], Chassis EEPROM service tag not programmed
Feb 15 21:31:08 leaf-1-1 systemd[1]: opx-pas.service: main process exited, code=killed, status=11/SEGV
Feb 15 21:31:08 leaf-1-1 systemd[1]: Unit opx-pas.service entered failed state.
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
root@leaf-1-1:/var/log/aos# cat /etc/OPX-release-version 
OS_NAME="OPX"
OS_VERSION="2.2"
PLATFORM="S6000-VM"
ARCHITECTURE="x86_64"
INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0"
BUILD_VERSION="2.2.0"
BUILD_DATE="2018-01-26T10:03:01-0800"
INSTALL_DATE="2018-02-02T22:50:34+0000"
root@leaf-1-1:/var/log/aos# 

@Ragsboss
I tried this command on two different VMs and couldn't reproduce the issue. Did you install the new version from a installer from bintray? Also, can you please do 'apt list --installed | grep opx' so I can see what packages and version you have installed?

Thanks.

vnam1 commented

@GarrickHe,

I am seeing the issue of opx-show-env if invoked early in boot process with 2.2.1 VM.
There was a mention of having issue if invoked before 90 sec. Is that solved in 2.2.1?

Thanks

@vnam1 ,

No, I haven't heard of this issue reappearing until now. I'll take a look and get back to you. Thanks.

-Garrick

@vnam1 @Ragsboss
I tried multiple times on VM and still can't reproduce the issue. I am testing it on 2.3.0. Have either one of you seen this issue lately?

Thanks,
Garrick

vnam1 commented

I am not seeing this issue anymore.

Closing the issue for now.