microsoft/SDN

Windows Server 2019 SDN - NetworkController upgrade failure

petkodeivn opened this issue · 0 comments

Hello,

We are using 2019 SDN and the NetworkController is failing its upgrade. This is what we see in Microsoft-Service Fabric/Operational log:

Log Name: Microsoft-ServiceFabric/Operational Source: Microsoft-ServiceFabric Date: 5/10/2019 4:20:18 PM Event ID: 29621 Task Category: CM Level: Information Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: Application upgrade started: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.6.0, Upgrade Type = Rolling, Rolling Upgrade Mode = Monitored, Failure Action = Rollback Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>29621</EventID> <Version>0</Version> <Level>4</Level> <Task>115</Task> <Opcode>0</Opcode> <Keywords>0x4000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:20:18.896735300Z" /> <EventRecordID>5192</EventRecordID> <Correlation /> <Execution ProcessID="768" ThreadID="5396" /> <Channel>Microsoft-ServiceFabric/Operational</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="applicationName">fabric:/NetworkController</Data> <Data Name="applicationTypeName">NetworkController</Data> <Data Name="applicationTypeVersion">12.0.6.0</Data> <Data Name="upgradeType">1</Data> <Data Name="rollingUpgradeMode">3</Data> <Data Name="failureAction">1</Data> </EventData> </Event>

Log Name: Microsoft-ServiceFabric/Operational Source: Microsoft-ServiceFabric Date: 5/10/2019 4:30:23 PM Event ID: 29623 Task Category: CM Level: Information Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: Application rollback start: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.2.1, Failure Reason = UpgradeDomainTimeout, Overall Elapsed Time = 600085ms Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>29623</EventID> <Version>0</Version> <Level>4</Level> <Task>115</Task> <Opcode>0</Opcode> <Keywords>0x4000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:30:23.229147400Z" /> <EventRecordID>5193</EventRecordID> <Correlation /> <Execution ProcessID="768" ThreadID="9544" /> <Channel>Microsoft-ServiceFabric/Operational</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="applicationName">fabric:/NetworkController</Data> <Data Name="applicationTypeName">NetworkController</Data> <Data Name="applicationTypeVersion">12.0.2.1</Data> <Data Name="failureReason">3</Data> <Data Name="overallUpgradeElapsedTime.timespan">600085</Data> </EventData> </Event>

Log Name: Microsoft-ServiceFabric/Operational Source: Microsoft-ServiceFabric Date: 5/10/2019 4:51:23 PM Event ID: 29626 Task Category: CM Level: Information Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: Application upgrade domain completed: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.2.1, Upgrade State = RollingBack, Upgrade Domains = (NCE-NCVM01.dswe.local), Upgrade Domain Elapsed Time = 1204436ms Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>29626</EventID> <Version>0</Version> <Level>4</Level> <Task>115</Task> <Opcode>0</Opcode> <Keywords>0x4000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:51:23.426171400Z" /> <EventRecordID>5194</EventRecordID> <Correlation /> <Execution ProcessID="768" ThreadID="9488" /> <Channel>Microsoft-ServiceFabric/Operational</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="applicationName">fabric:/NetworkController</Data> <Data Name="applicationTypeName">NetworkController</Data> <Data Name="applicationTypeVersion">12.0.2.1</Data> <Data Name="upgradeState">4</Data> <Data Name="upgradeDomains">(NCE-NCVM01.dswe.local)</Data> <Data Name="upgradeDomainElapsedTime.timespan">1204436</Data> </EventData> </Event>

In Microsoft-Service Fabric/Admin log we have (lots of these) warnings:

Log Name: Microsoft-ServiceFabric/Admin Source: Microsoft-ServiceFabric Date: 5/10/2019 4:20:14 PM Event ID: 55809 Task Category: FileStoreService Level: Warning Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: The request failed due to FABRIC_E_FILE_NOT_FOUND. StoreRelativePath:Store\NetworkController\SDNSLBM.Code.12.0.6 Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>55809</EventID> <Version>0</Version> <Level>3</Level> <Task>218</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:20:14.112331300Z" /> <EventRecordID>264055</EventRecordID> <Correlation /> <Execution ProcessID="4232" ThreadID="1680" /> <Channel>Microsoft-ServiceFabric/Admin</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="id">[(00000000-0000-0000-0000-000000003000:131955729520045166)+d2d515f0-89da-4c80-8d4b-3d9aed5957bb:0]</Data> <Data Name="type">ProcessRequestAsyncOperation</Data> <Data Name="text">The request failed due to FABRIC_E_FILE_NOT_FOUND. StoreRelativePath:Store\NetworkController\SDNSLBM.Code.12.0.6</Data> </EventData> </Event>

Log Name: Microsoft-ServiceFabric/Admin Source: Microsoft-ServiceFabric Date: 5/10/2019 4:20:14 PM Event ID: 55809 Task Category: FileStoreService Level: Warning Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: End(Delete): Store:Store\NetworkController\SDNSLBM.Code.12.0.6, Error:FABRIC_E_FILE_NOT_FOUND Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>55809</EventID> <Version>0</Version> <Level>3</Level> <Task>218</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:20:14.113151400Z" /> <EventRecordID>264056</EventRecordID> <Correlation /> <Execution ProcessID="3356" ThreadID="10580" /> <Channel>Microsoft-ServiceFabric/Admin</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="id"> </Data> <Data Name="type">InternalFileStoreClient</Data> <Data Name="text">End(Delete): Store:Store\NetworkController\SDNSLBM.Code.12.0.6, Error:FABRIC_E_FILE_NOT_FOUND</Data> </EventData> </Event>

Log Name: Microsoft-ServiceFabric/Admin Source: Microsoft-ServiceFabric Date: 5/10/2019 4:20:14 PM Event ID: 55809 Task Category: FileStoreService Level: Warning Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: Ending StoreTransaction for TranitionToIntermediateState. StoreRelativePath:Store\NetworkController\SDNSLBM.Config.12.0.6, Error:FABRIC_E_FILE_NOT_FOUND Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>55809</EventID> <Version>0</Version> <Level>3</Level> <Task>218</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:20:14.149982500Z" /> <EventRecordID>264058</EventRecordID> <Correlation /> <Execution ProcessID="4232" ThreadID="3580" /> <Channel>Microsoft-ServiceFabric/Admin</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="id">[(00000000-0000-0000-0000-000000003000:131955729520045166)+ac108407-981f-4685-9c2c-c735ff773d53:0]</Data> <Data Name="type">FileAsyncOperation</Data> <Data Name="text">Ending StoreTransaction for TranitionToIntermediateState. StoreRelativePath:Store\NetworkController\SDNSLBM.Config.12.0.6, Error:FABRIC_E_FILE_NOT_FOUND</Data> </EventData> </Event>

Log Name: Microsoft-ServiceFabric/Admin Source: Microsoft-ServiceFabric Date: 5/10/2019 4:30:18 PM Event ID: 29441 Task Category: CM Level: Warning Keywords: Default User: NETWORK SERVICE Computer: NCE-NCVM01.dswe.local Description: [(00000000-0000-0000-0000-000000002000:131955729520045166)+826bd87a-0708-4aa1-8780-3bda3401aea1:0] monitored upgrade timed out (Rollback): persisted[overall=10:00.081 UD=10:00.081 health=00.000] stopwatch[upgrade=00.003 health=00.000] timeouts[overall=1:00:00.000 ud=10:00.000] Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" /> <EventID>29441</EventID> <Version>0</Version> <Level>3</Level> <Task>115</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000001</Keywords> <TimeCreated SystemTime="2019-05-10T13:30:18.979737900Z" /> <EventRecordID>264061</EventRecordID> <Correlation /> <Execution ProcessID="768" ThreadID="9544" /> <Channel>Microsoft-ServiceFabric/Admin</Channel> <Computer>NCE-NCVM01.dswe.local</Computer> <Security UserID="S-1-5-20" /> </System> <EventData> <Data Name="id"> </Data> <Data Name="type">ApplicationUpgradeContext</Data> <Data Name="text">[(00000000-0000-0000-0000-000000002000:131955729520045166)+826bd87a-0708-4aa1-8780-3bda3401aea1:0] monitored upgrade timed out (Rollback): persisted[overall=10:00.081 UD=10:00.081 health=00.000] stopwatch[upgrade=00.003 health=00.000] timeouts[overall=1:00:00.000 ud=10:00.000]</Data> </EventData> </Event>

The upgrade/rollback sequence starts again and the same entries repeat in the logs. The NetworkController is otherwise working, i.e. we have no trouble with the SDN so far, but due to it constantly trying to upgrade and failing we cannot perform some operations, like backing up the NC database. Can someone please help?