locka99/opcua

Opcua client fails to connect to Prosys Simulation Server configured with multiple endpoints

Opened this issue · 19 comments

I am trying to connect to Prosys Simulation Server while developing code with this library.

If I connect with SecurityMode None or Sign, it works without issues. But when connecting to the server with the SignAndEncrypt mode and SecurityPolicy Basic256Sha256 it fails with strange errors. I have tried using the simple-client from the examples folder, both with version 0.12 and the latest commit cf2d1bd on master.

simple-client 0.12 output with RUST_OPCUA_LOG=debug

2024-08-07 14:11:00.426 DEBUG opcua::client::message_queue             Request 2 was processed by the server
2024-08-07 14:11:00.438 DEBUG opcua::client::comms::tcp_transport      receive chunk intermediate 1002:272
2024-08-07 14:11:00.482 DEBUG opcua::client::comms::tcp_transport      receive chunk intermediate 1002:273
2024-08-07 14:11:00.482 ERROR opcua::types::service_types::enums       Invalid value 257 for enum MessageSecurityMode
2024-08-07 14:11:00.482 DEBUG opcua::core::comms::chunker              Cannot decode message CreateSessionResponse_Encoding_DefaultBinary, err = StatusCode(IS_ERROR | BadUnexpectedError)
2024-08-07 14:11:00.482 INFO  opcua::client::comms::tcp_transport      ReadState has dropped
2024-08-07 14:11:00.482 INFO  opcua::client::comms::tcp_transport      WriteState has dropped
2024-08-07 14:11:00.482 DEBUG opcua::client::comms::tcp_transport      Closing connection because the read loop terminated
2024-08-07 14:11:10.426 INFO  opcua::client::session::session_state    Timeout waiting for response from server
2024-08-07 14:11:10.426 INFO  opcua::client::message_queue             Request 2 has timed out and any response will be ignored
2024-08-07 14:11:10.426 ERROR opcua::client::client                    Got an error while creating the default session - BadTimeout
2024-08-07 14:11:10.426 DEBUG opcua::client::session::session_state    Session was closed with status = BadServiceUnsupported
2024-08-07 14:11:10.426 INFO  opcua::client::session::session          Session has dropped
2024-08-07 14:11:10.426 INFO  opcua::client::comms::tcp_transport      TcpTransport has dropped
2024-08-07 14:11:10.426 INFO  opcua::client::session::session_state    SessionState has dropped

The message Invalid value 257 for enum MessageSecurityMode changes values to 257, 513 and 769 depending on which Endpoints I have opened on the server. Interestingly the values correspond to 0x0101, 0x0201 and 0x0301.

The only way I can get the simple-client to connect to this server with SignAndEncrypt is if I configure the server with only a single endpoint for SignAndEncrypt and the Basic256Sha256 policy.

I would like to be able to connect to the server with any possible combination of endpoints. If there is anything else I can provide to help debugging this issue I will gladly provide.

Thank you for your time creating this otherwise wonderful library!

This is strange, I wonder if it's a binary layout issue? MessageSecurityMode is quite clearly either 1, 2, or 3. This is presumably in the endpointDescriptions array in create session response. I double checked, and I'm quite sure we're decoding that value correctly, treating it as a 32 bit number.

It would be good to run wireshark and see what it thinks the payload is, but it really does look to me like the server is setting an invalid value for that field, for some reason.

I can't find anything in the standard for modifying that specific enum, at least.

A workaround that I think can be done with the upcoming rewrite of the type generation is to decode invalid values to the INVALID_0 MessageSecurityMode, which might solve the issue, that might be how other clients tolerate this server, though I don't know for sure.

I tried analyzing the network packets with Wireshark, but it apparently does not support decrypting OPC UA.

I have however gotten a bit closer to figuring it out. The data being parsed out when decoding the GetEndpointsResponse contains a null byte at the end of each chunk. So an extra byte of 0x00 is added into the middle of the server certificate, causing MessageSecurityMode::decode() to read the last byte of the certificate and then the first 3 bytes from the MSM.

I have not figured out if this is caused by this library or by the server.

I have found the cause of this issue. It seems that this code does not take into account that there is a paddingSize byte between the message data and the signature. See spec.

This causes each chunk to become corrupt and contain an extra 0 at the end.

I feel like I would need to spend a lot more time reading to figure out if you're right, but this is really good catch if so. I wonder if this applies to the send-side as well?

Trying to investigate this by connecting to a server in the reference SDK and I can't make that work either, which may or may not be related.

Trying to investigate this by connecting to a server in the reference SDK and I can't make that work either, which may or may not be related.

Not related, just a mistake on my end, and a tiny bug I will fix, not related to your problem.

Trying to investigate this by connecting to a server in the reference SDK and I can't make that work either, which may or may not be related.

Not related, just a mistake on my end, and a tiny bug I will fix, not related to your problem.

It occurs for me when there are many endpoints enabled on the server, so that the message gets chunked up.

Managed to reproduce, I think, got a BadDecodingError once I got the server to split the message. Took a while because I found a bug in the reference SDK. Clearly nobody can be trusted to correctly implement this standard.

Different issue than you got, but that's probably just a coincidence, I imagine you would have gotten a similar error if the decoder managed to continue: Decoding error - Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }.

Got it.

if self.security_mode == MessageSecurityMode::SignAndEncrypt {
    Self::remove_signature_and_padding(
        decrypted_data,
        decrypted_size,
        signature_size,
        &self.decoding_options,
    )?
} else {
    Self::update_message_size_and_truncate(
        decrypted_data, 
        decrypted_size - signature_size,
        &self.decoding_options
    )?
}

and

fn remove_signature_and_padding(
    mut data: Vec<u8>,
    decrypted_size: usize,
    signature_size: usize,
    decoding_options: &DecodingOptions
) -> Result<Vec<u8>, StatusCode> {
    let padding_start = decrypted_size - signature_size - 1;
    let padding_size = data[padding_start];
    
    for i in ((padding_start - padding_size as usize)..padding_start).rev() {
        if data[i] != padding_size {
            error!("Failed to verify padding, got {} when looking at byte {}, expected {}",
                data[i], i, padding_size);
            return Err(StatusCode::BadDecodingError);
        }
    }
    // Remove one extra byte for the padding size.
    let message_size = decrypted_size - signature_size - padding_size as usize - 1;
    Self::update_message_size(&mut data, message_size, decoding_options)?;
    data.truncate(message_size);
    Ok(data)
}

I did this deep in my own branch since the compile times are so much better, but I'll see about porting my changes, which includes some other tiny fixes, to the current master branch. I also want to see if I can write a test, though since the server uses the same code it's a bit limited in usefulness.

This also does not do anything about ExtraPaddingSize, I need to make a sample that actually has a key larger than 2048 bytes to test that.

The standard also talks about Authenticated encryption algorithms, which don't have padding? We also need to look into that.

The secure_channel code is a total mess at this point, it could use a major cleanup and some more structure, but I want to put together a set of tests against a reference server before we do that.

Great work! If there is any code you'd like me to try out I'd be glad to help.

Different issue than you got, but that's probably just a coincidence, I imagine you would have gotten a similar error if the decoder managed to continue: Decoding error - Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }.

I get errors relating to decoding of strings. Because the length becomes crazy when the parsed fields are off. Decoding Error with statuscode BadDecodingError. String buf length 151391536 exceeds decoding limit 65535.

It would be great if you could give #377 a try @DOhlsson, it should fix your issue.

I tried it out and it successfully connects to the server and receives the first Publish, but after that it gets an error over and over again every second. This happens with both Security Policy Basic256Sha256 and None.

Here is the log output for the full run of simple-client. I terminated it after two errors.

2024-08-21 09:24:18.414 INFO  opcua::console_logging                   Logging is enabled, use RUST_OPCUA_LOG environment variable to control filtering, logging level
2024-08-21 09:24:18.415 WARN  opcua::client::config                    Endpoint config contains no endpoints
2024-08-21 09:24:18.417 INFO  opcua::client::transport::channel        Connect
2024-08-21 09:24:18.418 INFO  opcua::client::transport::channel        Security policy = None
2024-08-21 09:24:18.418 INFO  opcua::client::transport::channel        Security mode = None
2024-08-21 09:24:18.418 DEBUG opcua::client::transport::tcp            Connecting to 10.10.200.145:53530 with url opc.tcp://internal.domain:53530/OPCUA/SimulationServer
2024-08-21 09:24:18.419 INFO  opcua::client::transport::state          Making secure channel request
2024-08-21 09:24:18.419 INFO  opcua::client::transport::state          security_mode = None
2024-08-21 09:24:18.419 INFO  opcua::client::transport::state          security_policy = None
2024-08-21 09:24:18.420 DEBUG opcua::client::transport::state          Setting transport's security token
2024-08-21 09:24:18.463 DEBUG opcua::client::transport::tcp            Writer is about to send a CloseSecureChannelRequest which means it should close in a moment
2024-08-21 09:24:18.463 DEBUG opcua::client::transport::tcp            Writer is setting the connection state to finished(good)
2024-08-21 09:24:18.468 INFO  opcua::client::transport::channel        Connect
2024-08-21 09:24:18.471 INFO  opcua::client::transport::channel        Security policy = Basic256Sha256
2024-08-21 09:24:18.471 INFO  opcua::client::transport::channel        Security mode = SignAndEncrypt
2024-08-21 09:24:18.472 DEBUG opcua::client::transport::tcp            Connecting to 10.10.200.145:53530 with url opc.tcp://internal.domain:53530/OPCUA/SimulationServer
2024-08-21 09:24:18.477 INFO  opcua::client::transport::state          Making secure channel request
2024-08-21 09:24:18.478 INFO  opcua::client::transport::state          security_mode = SignAndEncrypt
2024-08-21 09:24:18.478 INFO  opcua::client::transport::state          security_policy = Basic256Sha256
2024-08-21 09:24:18.504 DEBUG opcua::client::transport::state          Setting transport's security token
2024-08-21 09:24:18.565 DEBUG opcua::crypto::certificate_store         Validating cert with name on disk SimulationServer@internal.domain [2ced3d75878fb901ec8b548a910e4b31513a69e5].der
2024-08-21 09:24:18.566 INFO  opcua::crypto::x509                      Certificate is valid for this time
2024-08-21 09:24:18.566 INFO  opcua::crypto::x509                      Certificate host name internal.domain is good
2024-08-21 09:24:18.566 INFO  opcua::crypto::x509                      Certificate application uri urn:internal.domain:OPCUA:SimulationServer is good
2024-08-21 09:24:18.573 DEBUG opcua::client::session::services::attributes session:1 read() requested to read nodes [ReadValueId { node_id: NodeId { namespace: 0, identifier: Numeric(2259) }, attribute_id: 13, index_range: UAString { value: None }, data_encoding: QualifiedName { namespace_index: 0, name: UAString { value: None } } }]
2024-08-21 09:24:18.574 DEBUG opcua::client::session::services::subscriptions::event_loop Sending publish due to external trigger
2024-08-21 09:24:18.574 DEBUG opcua::client::session::services::subscriptions::service session:1 create_subscription, created a subscription with id 12
Created a subscription with id = 12
2024-08-21 09:24:18.574 DEBUG opcua::client::session::services::subscriptions::service session:1 create_monitored_items, for subscription 12, 4 items
2024-08-21 09:24:18.574 DEBUG opcua::client::session::services::subscriptions::service publish is acknowledging subscription acknowledgements with sequence nrs []
2024-08-21 09:24:18.575 DEBUG opcua::client::session::services::attributes session:1 read(), success
2024-08-21 09:24:18.576 DEBUG opcua::client::session::services::subscriptions::service session:1 create_monitored_items, 4 items created
2024-08-21 09:24:18.676 DEBUG opcua::client::session::services::subscriptions::service session:1 PublishResponse
Data change from server:
Item "ns=3;i=1001", Value = Int32(13)
Data change from server:
Item "ns=3;i=1002", Value = Double(-1.795174)
Data change from server:
Item "ns=3;i=1003", Value = Double(-0.8)
Data change from server:
Item "ns=3;i=1004", Value = Double(-1.17557)
2024-08-21 09:24:19.576 DEBUG opcua::client::session::services::subscriptions::event_loop Sending publish due to internal tick
2024-08-21 09:24:19.576 DEBUG opcua::client::session::services::subscriptions::service publish is acknowledging subscription acknowledgements with sequence nrs [1]
2024-08-21 09:24:19.580 ERROR opcua::client::session::services::subscriptions::service session:1 publish failed ServiceFault(ServiceFault { response_header: ResponseHeader { timestamp: DateTime { date_time: 2024-08-21T09:24:19.579Z }, request_handle: 8, service_result: StatusCode(IS_ERROR | BadInternalError | BadResourceUnavailable | BadIdentityTokenInvalid), service_diagnostics: DiagnosticInfo { symbolic_id: None, namespace_uri: None, locale: None, localized_text: None, additional_info: None, inner_status_code: None, inner_diagnostic_info: None }, string_table: None, additional_header: ExtensionObject { node_id: NodeId { namespace: 0, identifier: Numeric(0) }, body: None } } })
2024-08-21 09:24:19.580 ERROR opcua::client::session                   Received a service fault of BadSessionClosed for the request
2024-08-21 09:24:19.581 ERROR opcua::client::session::services::subscriptions::event_loop session:1 Publish response indicates session is dead
2024-08-21 09:24:20.578 DEBUG opcua::client::session::services::subscriptions::event_loop Sending publish due to internal tick
2024-08-21 09:24:20.578 DEBUG opcua::client::session::services::subscriptions::service publish is acknowledging subscription acknowledgements with sequence nrs [1]
2024-08-21 09:24:20.582 ERROR opcua::client::session::services::subscriptions::service session:1 publish failed ServiceFault(ServiceFault { response_header: ResponseHeader { timestamp: DateTime { date_time: 2024-08-21T09:24:20.580Z }, request_handle: 9, service_result: StatusCode(IS_ERROR | BadInternalError | BadResourceUnavailable | BadIdentityTokenInvalid), service_diagnostics: DiagnosticInfo { symbolic_id: None, namespace_uri: None, locale: None, localized_text: None, additional_info: None, inner_status_code: None, inner_diagnostic_info: None }, string_table: None, additional_header: ExtensionObject { node_id: NodeId { namespace: 0, identifier: Numeric(0) }, body: None } } })
2024-08-21 09:24:20.582 ERROR opcua::client::session                   Received a service fault of BadSessionClosed for the request
2024-08-21 09:24:20.582 ERROR opcua::client::session::services::subscriptions::event_loop session:1 Publish response indicates session is dead

// TODO: Do something here?, :/

Seems to me like the server has closed the client for whatever reason. I think this is almost certainly a different issue, and unrelated to the matter of padding.

Anything from the server logs to indicate why the session is closed?

The server might be making some weird choices when it comes to default session timeout. Could you try setting session_timeout when building the client?

The server might be making some weird choices when it comes to default session timeout. Could you try setting session_timeout when building the client?

It works if I set session_timeout. This appears to be a change in behavior in the library between 0.12 and master. In 0.12 without session_timeout, the server reports that the client sends a RequestedSessionTimeout of 1.7976931348623157e308. But in your branch and master the client sends 0.0 which leads the server to respond with a timeout of 100.0.

So the behavior has changed in the client, do you know if this is intentional or not?

But anyway, by setting the session_timeout it works fine with the server now. Thank you for your help!

We should probably set a different session timeout, I'll look into that at some point. We should also revise the keep-alive interval based on revised session timeout, maybe. I'll give it some thought.