CiscoSecurity/fp-05-microsoft-sentinel-connector

Process subscriberParser is dead

Closed this issue ยท 26 comments

Hello and good day to all of your guys/gals,

We have been attempting to keep our eSteamer up and working to send the logs up to MS Sentinel. Still, no matter what version we use (we are now using 5.0.0), we continue to receive a controller message "Process subscriberParser is dead". These error or messages stops the log collector from receiving the logs from Cisco Firepower v6.7.0.3. I have been searching Cisco's bug reports and GoogleFu, but I can't find a fix for this. We are using Ubuntu 21.04 now and tried CentOS8 as well. Any help is greatly appreciated. Thank you for your time and support - Joe Stauffer.

2022-03-17 12_19_52-Window

Watching as I have the exact same issue with RHEL 8.5

@gmccauley I opened a ticket with Cisco TAC. I will make sure I put the fix in here as soon as I know. This one has been driving me crazy over the last month. Thank you for posting on here too.

FYI - I ran encore.sh in foreground mode most of the day hoping for an error that's not present in the logs and low and behold...

2022-03-21 14:14:36,907 Monitor      INFO     Running. 1192300 handled; average rate 764.2 ev/sec;
Process Process-1:
Traceback (most recent call last):
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/pipeline.py", line 139, in parse
    parser.parse()
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/adapters/binary.py", line 698, in parse
    self._parse( self.data, self.offset, self.record )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/adapters/binary.py", line 564, in _parse
    offset = self._parseAttributes( data, offset, attributes, record )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/adapters/binary.py", line 479, in _parseAttributes
    offset = self._parseBlock( data, offset, attribute, block )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/adapters/binary.py", line 225, in _parseBlock
    offset = self._parseAttributes( data, offset, blockDefinition, context )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/adapters/binary.py", line 320, in _parseAttributes
    recordType ))
estreamer.exception.ParsingException: _attributes() | offset (1819241145) > length (172) | blockType=167 recordType=94

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/baseproc.py", line 111, in _start
    callback()
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/receiver.py", line 159, in next
    self._parseMessageBundle( message )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/receiver.py", line 111, in _parseMessageBundle
    self._send( message )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/receiver.py", line 143, in _send
    self.callback( message )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/pipeline.py", line 431, in onEvent
    event = parse( message, self.settings )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/pipeline.py", line 155, in parse
    logger.warning( ex )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/crossprocesslogging/baseClient.py", line 94, in warning
    self.log(logging.WARNING, data)
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/crossprocesslogging/baseClient.py", line 69, in log
    data = self.__serialise( data )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/crossprocesslogging/baseClient.py", line 35, in __serialise
    message = data.__class__.__name__ + ': ' + data.message
AttributeError: 'ParsingException' object has no attribute 'message'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/pipeline.py", line 285, in __init__
    outputQueue )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/baseproc.py", line 298, in __init__
    outputQueue )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/baseproc.py", line 136, in __init__
    self.start()
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/pipeline.py", line 302, in start
    self._start( self.receiver.next )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/baseproc.py", line 118, in _start
    self.logger.exception(ex)
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/crossprocesslogging/baseClient.py", line 106, in exception
    data = self.__serialise( data, True )
  File "/sentinel/fp-05-microsoft-sentinel-connector-5.0.0-pvcldsfire01/estreamer/crossprocesslogging/baseClient.py", line 35, in __serialise
    message = data.__class__.__name__ + ': ' + data.message
AttributeError: 'AttributeError' object has no attribute 'message'
2022-03-21 14:16:36,826 Controller   INFO     Process subscriberParser is dead.

Thank you, @gmccauley, for posting this information for us. We now have a Microsoft resource working on it for us too. It's now a race to see who blames who first. Cisco, or MS. :-) Fingers crossed for a resolution.

@Joestauf - My MS guy got back with me today too. He said one of his other customers resolved the issue with the following:
Client installed python2 and made that the default python. Then changed the FMC code to the python2 version, versus the python3.

I'm not a fan of that approach.

@Joestauf - My MS guy got back with me today too. He said one of his other customers resolved the issue with the following: Client installed python2 and made that the default python. Then changed the FMC code to the python2 version, versus the python3.

I'm not a fan of that approach.

I was afraid that may be what they suggest to do too as a "fix". Python 2 was EOL January 2020. I am with you on this one, they need to figure out getting it to work with 3.6/3.8. Thank you for letting me know this.

@Joestauf - Any luck with TAC?

Please excuse me for the delay; I have meant to jump on here and give a quick update. Cisco TAC finally got back to me yesterday after requesting updates a few times over the last 15 days. The tech has built a CentOS box and can't seem to recreate the problem. He has asked that I change the log level to TRACE in the estreamer.conf file and send him the results when it crashes again. Then he wants me to change the log level to debug and repeat the process, sending in both logs. (TRACE and DEBUG) I will work on this today.

On a side note:

The only way I have found to keep it up and running was to revert back to version 3.7.1 which does indeed use Python2 instead of Python3. I even had one of my dev's try to run through the baseline.py and core.py where events are defined. Updating these files did nothing positive.

Today I am going to fire up version 5.0 again and update the logging level as requested. The upload the logs for his review. More details to follow. Cheers!

Quick update: Trace and Debug logs have been uploaded to our case at TAC. We also uploaded all of our eStreamer files to allow for a review of our entire configuration. We are waiting for the findings of their analysis.

Still no word from Cisco yet...

We have asked for updates from Cisco several times with zero response. I have also reached out to both team leads and the supervisor on my ticket. Not one single person has gotten back to me. Quality service. We also had a scenario where the latest updates to Ubuntu broke the connection up to MS Sentinel. We had to remove the MS OMS piece and reinstall it, restoring connectivity to Sentinel. Version 3.7.1 of the eStreamer continues to work.

This is what I got back from Cisco TAC. It's been running fine with no errors after excluding event type 94.

Seems like encore is unable to parse event type 94. Please add 94 to the exclude list in estreamer.conf and restart encore client.
], 
        "records": {
            "connections": true, 
            "core": true, 
            "excl@comment": [
                "These records will be excluded regardless of above (overrides 'include')", 
                "e.g. to exclude flow and IPS events use [ 71, 400 ]"
            ], 
            "exclude": [94], 
            "inc@comment": "These records will be included regardless of above", 
            "include": [], 
            "intrusion": true, 
            "metadata": true, 
            "packets": true, 
            "rna": true, 
            "rua": true

@gmccauley You are my hero! Thank you for taking time out of your busy day to follow up and share. I appreciate you very much! Thank you!

This is what I got back from Cisco TAC. It's been running fine with no errors after excluding event type 94.

Seems like encore is unable to parse event type 94. Please add 94 to the exclude list in estreamer.conf and restart encore client.
], 
        "records": {
            "connections": true, 
            "core": true, 
            "excl@comment": [
                "These records will be excluded regardless of above (overrides 'include')", 
                "e.g. to exclude flow and IPS events use [ 71, 400 ]"
            ], 
            "exclude": [94], 
            "inc@comment": "These records will be included regardless of above", 
            "include": [], 
            "intrusion": true, 
            "metadata": true, 
            "packets": true, 
            "rna": true, 
            "rua": true

Thanks for sharing this @gmccauley, we'll give it a go and see if it resolves it for us also.

Do you happen to know what event type 94 actually is? I've had a quick look in documentation and can't find it referenced. Just be good to know what we are excluding.

Thanks ๐Ÿ‘

I've been troubleshooting this issue for a couple of weeks now. After running the encore.sh in foreground with TRACE logging I was able to determine that the issue was occurring with records 92, 93 and 94. I was able to add all three of them to exclude in estreamer.conf but I didn't want to miss those transactions. I removed all from exclude and instead set "rua": false in the records configuration. This stops estreamer from trying to process Realtime User Agent information. I have a feeling that there is something configured wrong within Realtime User Agent within our FMC configuration that's causing this. Once I set "rua": false the process has been running without fail and we're still able to process 92, 93 and 94 record types.

working configuration:

        "records": {
            "connections": true,
            "core": true,
            "excl@comment": [
                "These records will be excluded regardless of above (overrides 'include')",
                "e.g. to exclude flow and IPS events use [ 71, 400 ]"
            ],
            "exclude": [],
            "inc@comment": "These records will be included regardless of above",
            "include": [],
            "intrusion": true,
            "metadata": true,
            "packets": true,
            "rna": true,
            "rua": false
        }`

@Dooberjolly - I believe you can find the answer if you look at the records.py file. It is located in the following location.
fp-05-microsoft-sentinel-connector-5.0.0/estreamer/definitions/records.py

94

RUA_EVENT_NEW_USER: {
    'name': u'New User Identification Event',
    'attributes': [
        { 'discovery': True },
        { 'block': BLOCK_USER_INFORMATION_VPN_LOGOFF_62, 'name': 'user'}],
    'category': u'RUA' },

@Dooberjolly - I believe you can find the answer if you look at the records.py file. It is located in the following location. fp-05-microsoft-sentinel-connector-5.0.0/estreamer/definitions/records.py

94

RUA_EVENT_NEW_USER: {
    'name': u'New User Identification Event',
    'attributes': [
        { 'discovery': True },
        { 'block': BLOCK_USER_INFORMATION_VPN_LOGOFF_62, 'name': 'user'}],
    'category': u'RUA' },

Awesome, thanks for pointing that out ๐Ÿ‘

This is what I got back from Cisco TAC. It's been running fine with no errors after excluding event type 94.

Seems like encore is unable to parse event type 94. Please add 94 to the exclude list in estreamer.conf and restart encore client.
], 
        "records": {
            "connections": true, 
            "core": true, 
            "excl@comment": [
                "These records will be excluded regardless of above (overrides 'include')", 
                "e.g. to exclude flow and IPS events use [ 71, 400 ]"
            ], 
            "exclude": [94], 
            "inc@comment": "These records will be included regardless of above", 
            "include": [], 
            "intrusion": true, 
            "metadata": true, 
            "packets": true, 
            "rna": true, 
            "rua": true

Thanks for sharing this @gmccauley, we'll give it a go and see if it resolves it for us also.

Do you happen to know what event type 94 actually is? I've had a quick look in documentation and can't find it referenced. Just be good to know what we are excluding.

Thanks ๐Ÿ‘

You can find the record types in the Firepower eStreamer Integration Guide. 93 is defined as a User Removed Change
Event, 94 New User Identification Event, and 95 User Login Change Event

I've been troubleshooting this issue for a couple of weeks now. After running the encore.sh in foreground with TRACE logging I was able to determine that the issue was occurring with records 92, 93 and 94. I was able to add all three of them to exclude in estreamer.conf but I didn't want to miss those transactions. I removed all from exclude and instead set "rua": false in the records configuration. This stops estreamer from trying to process Realtime User Agent information. I have a feeling that there is something configured wrong within Realtime User Agent within our FMC configuration that's causing this. Once I set "rua": false the process has been running without fail and we're still able to process 92, 93 and 94 record types.

working configuration:

        "records": {
            "connections": true,
            "core": true,
            "excl@comment": [
                "These records will be excluded regardless of above (overrides 'include')",
                "e.g. to exclude flow and IPS events use [ 71, 400 ]"
            ],
            "exclude": [],
            "inc@comment": "These records will be included regardless of above",
            "include": [],
            "intrusion": true,
            "metadata": true,
            "packets": true,
            "rna": true,
            "rua": false
        }`

Awesome work! @dev-sec7 and thank you for sharing. You all just made my Friday and weekend! I can't thank you guys enough. That trace log is Nasty and gets huge fast. I appreciate your commitment to fixing it right instead of the Cisco way of excluding the events entirely.

With the rua setting set to false, version 5.0.0 worked without issue all weekend. Thank you again, @dev-sec7, for providing the fix. I also noticed there is now version 5.1.0 out that addresses some of the events with FMC version 7. We updated the Firepower Management Center to version 7.0.1 last Friday. We are going to give 5.1.0 a go today and hope everything remains happy. Thanks again to everyone for your time and help.

Quick update. Version 5.1.0 crashed today even with the RUA turned to false and settings identical to my working 5.0.0 version. I haven't had time to look into it yet.

Quick update. Version 5.1.0 crashed today even with the RUA turned to false and settings identical to my working 5.0.0 version. I haven't had time to look into it yet.

Any updates on what caused this crash? This has still been running reliably for us with no issues.

@dev-sec7 - I loaded the 5.1.0 version today and enabled DEBUG in the logging. It looks like record type 98 is causing the issue. I turned RUA on and off, and it seems to have the same behavior. The only way I could get it to stay up and running was to exclude record type 98. Here is a screenshot of the message. I picked this one as it doesn't have a username listed. However, I receive errors with end-user accounts listed as well.
2022-05-31 13_30_13-Clipboard

I also see the following problem with record type 119. However, it doesn't cause any issues like type 98 does.

image

Looks like it failed overnight. I believe it was record type 94 this time. I had re-enabled the RUA, so this is likely the cause.
image

@dev-sec7 - I loaded the 5.1.0 version today and enabled DEBUG in the logging. It looks like record type 98 is causing the issue. I turned RUA on and off, and it seems to have the same behavior. The only way I could get it to stay up and running was to exclude record type 98. Here is a screenshot of the message. I picked this one as it doesn't have a username listed. However, I receive errors with end-user accounts listed as well. 2022-05-31 13_30_13-Clipboard

I also see the following problem with record type 119. However, it doesn't cause any issues like type 98 does.

image

98 appears to be a Legacy record type (contains server information for a host) which is documented in earlier versions of the FMC product. I don't see those record types coming through in our system.

5.1.0 works excellent after excluding event 98 and turning RUA to false in the estreamer.conf file. Thanks again to everyone for your time and help with this process.