apache/plc4x

[Bug]: ADS driver option load-symbol-and-data-type-tables=false doesn't resolve the variable

mrwhy-orig opened this issue · 27 comments

What happened?

Connect to your Twincat3 PLC with the option load-symbol-and-data-type-tables set to false.
Try browsing or reading a variable, it doesn't get resolved, instead an error is thrown (for reading a specific variable):
Couldn't resolve symbolic address, YOUR.VARIABLE_NAME invalid

Version

v0.13.0

Programming Languages

  • plc4j
  • plc4go
  • plc4c
  • plc4net

Protocols

  • AB-Ethernet
  • ADS /AMS
  • BACnet/IP
  • CANopen
  • DeltaV
  • DF1
  • EtherNet/IP
  • Firmata
  • KNXnet/IP
  • Modbus
  • OPC-UA
  • S7

Is there a particular reason for setting the option to false?
I mean ... yes: The resolution of symbolic addresses should in that case still work asynchronously, but the browsing would be quite a bit of work and be quite stressfull on the PLC. I would be leaning toward not allowing to browse in case of loading of the tables being disabled.

I had some connection issues with the option being true. Like one out of ten I ran into an exception. So I thought maybe the Symboltable is to big... I really only need a very small subset, I'll put this option to false.

The size should not matter ... might be something else going wrong. A wireshark capture could help, but if you do, I should make you aware that it would contain the structure of all variables in your PLC ... if that's something you don't want to share with everyone, you might consider sending me a link to cdutz@apache.org ... then I could have a look and possibly find out why this 1 of 10 doesn't work.

Other than that ... yes ... the driver should lookup the symbolic address, if you disable the loading on connection.

I when through the same though process as mrwhy-orig. Maybe symbolic too big, trying « load-symbol-and-data-type-tables » to « false », which end up « INVALID_ADDRESS ».

In my case, 10 out of 10 the connection doesn't work.

In the connection, I set the usual : target-ams-net-id, arget-ams-port, source-ams-net-id and source-ams-port
Had to increase the timeout. Made it bigger than necessary : timeout-request=20000

Program hang on the connection.
Wireshark attached.
1626-fdupont.pcapng.gz

Last packet is smaller than the others.

Test program :

PlcDriverManager plcDriver = new DefaultPlcDriverManager();
  try (PlcConnection plcConnection = plcDriver.getConnectionManager()
    .getConnection(/*ads...  */) {
    //...
  } catch (Exception ex) {
    //...        
  }

Two logs of « maybe » interest :

{
    "level": "TRACE",
    "threadName": "nioEventLoopGroup-2-1",
    "loggerName": "org.apache.plc4x.java.spi.Plc4xNettyWrapper",
    "message": "Failure while processing payload {} with handler {}",
    "throwable": {
        "className": "java.lang.NullPointerException",
        "message": "Cannot invoke \"org.apache.plc4x.java.ads.readwrite.AdsDataTypeTableEntry.getDataType()\" because \"adsDataTypeTableEntry\" is null",
        "stepArray": [{
                "className": "org.apache.plc4x.java.ads.protocol.AdsProtocolLogic",
                "methodName": "resolveDirectAdsTagForSymbolicNameFromDataType",
                "fileName": "AdsProtocolLogic.java",
                "lineNumber": 1779
            }
        ]
    }
}, {
    "level": "WARN",
    "threadName": "nioEventLoopGroup-2-1",
    "loggerName": "io.netty.channel.DefaultChannelPipeline",
    "message": "An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.",
    "throwable": {
        "className": "io.netty.handler.codec.DecoderException",
        "message": "java.lang.ClassCastException: class org.apache.plc4x.java.ads.readwrite.AdsReadResponse cannot be cast to class org.apache.plc4x.java.ads.readwrite.AmsTCPPacket (org.apache.plc4x.java.ads.readwrite.AdsReadResponse and org.apache.plc4x.java.ads.readwrite.AmsTCPPacket are in unnamed module of loader 'app')",
        "stepArray": [{
                "className": "io.netty.handler.codec.MessageToMessageDecoder",
                "methodName": "channelRead",
                "fileName": "MessageToMessageDecoder.java",
                "lineNumber": 98
            }
        ],
        "cause": {
            "className": "java.lang.ClassCastException",
            "message": "class org.apache.plc4x.java.ads.readwrite.AdsReadResponse cannot be cast to class org.apache.plc4x.java.ads.readwrite.AmsTCPPacket (org.apache.plc4x.java.ads.readwrite.AdsReadResponse and org.apache.plc4x.java.ads.readwrite.AmsTCPPacket are in unnamed module of loader 'app')",
            "stepArray": [{
                    "className": "org.apache.plc4x.java.spi.Plc4xNettyWrapper",
                    "methodName": "decode",
                    "fileName": "Plc4xNettyWrapper.java",
                    "lineNumber": 191
                }, {
                    "className": "io.netty.handler.codec.MessageToMessageCodec$2",
                    "methodName": "decode",
                    "fileName": "MessageToMessageCodec.java",
                    "lineNumber": 81
                }, {
                    "className": "io.netty.handler.codec.MessageToMessageDecoder",
                    "methodName": "channelRead",
                    "fileName": "MessageToMessageDecoder.java",
                    "lineNumber": 88
                }
            ]
        }
    }
}

Nothing of level « error ».

The problem is, that I've found out how to load a symbol, not however hot to load one datatype definition of only one symbol.

So currently I would not disable the symbol table loading when relying on symbolic addresses.

I am working on implementing that... However too much to do, too little time to do so and no funding to make overtime work appealing.

No problem. We appreciate the work you do.

Should I transfer the connection problem I have (when load-symbol-and-data-type-tables is true) into another issue?
It does this into the production environment only. In my development environment, I reproduced the few variables I access, and it works fine.

In your pcap I can see the PLC is sending data in high frequency to the client. I assume this is the symbol table (As that's read first) it seems that it doesn't even come to the data-type-table. I don't think that a new issue is required, as indeed we need to be able to dynamically resolve tags.

However the problem, is that the ADS driver has already become quite complex and my attempt to refactor it over the last week, was super complex. Here I tried to make the AdsTagHandler correctly resolve Ads Tags using the data in the symbol- and data-type-table. These changes already resulted in very complex refactoring.

I therefore started implementing a new ADS driver, that will use a different approach ... one that would fix my problem and automatically also this issue ... however, as stated in my recent blog post (https://github.com/chrisdutz/blog/blob/main/plc4x/throwing-the-towel.adoc) ... I'm no longer doing big stuff like that for free ... so I will be implementing this, but I'll only donate it back to PLC4X if I get enough donations for this.

I do think today I had an idea, how I could probably work around your problem without rewriting the entire driver.
I could probably read the tables in chunks and always wait for them to be loaded ... I guess if I load all in 1MB chunks, this should probably work and it would not actually fix the issue you reported, but it would make the problem go away ;-)

Is it something you are planning to try in the following weeks? Any quick fix is welcome.
In our project, we are at a point of evaluating ours options with ADS, in the allowed timeframe.
It was a bummer when we discovered we got stuck at the connection process on the production environment.