lorenzodonini/ocpp-go

ocpp1.6 - parsing MeterValues from Alfen with invalid TransactionId (9223372036854775807)

qosmotec opened this issue · 18 comments

an excerpt from the log of the Alfen Eve Double Station, connected via ocpp1.6j shows that the current TransactionId has been lost at the station for some reason. Instead of the valid Transaction Id, 9223372036854775807 is sent that is not according to the specification (Transaction Id can be omitted, if it is set, it must be valid).

2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:-> [2,"42179","MeterValues",{"connectorId":1,"transactionId":9223372036854775807
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:,"meterValue":[{"timestamp":"2023-04-15T02:25:11Z","sampledValue":[{"value":"120
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:701.000","context":"Sample.Periodic","measurand":"Energy.Active.Import.Register"
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:,"location":"Outlet","unit":"Wh"},{"value":"0.000","context":"Sample.Periodic","
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:measurand":"Current.Import","location":"Outlet","unit":"A","phase":"L1"},{"value
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:":"0.000","context":"Sample.Periodic","measurand":"Current.Import","location":"O
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:utlet","unit":"A","phase":"L2"},{"value":"5.561","context":"Sample.Periodic","me
2023-04-19T09:52:16.062Z:COM:ocpp_rpc.c:681:asurand":"Current.Import","location":"Outlet","unit":"A","phase":"L3"}]}]}]
2023-04-19T09:52:16.167Z:COM:ocpp_rpc.c:1127:<- [4,"42179","FormationViolation","json: cannot unmarshal number 92233720368547
2023-04-19T09:52:16.167Z:COM:ocpp_rpc.c:1127:76000 into Go struct field MeterValuesRequest.transactionId of type int",null]

As a consequence, a conversion of the large number (9223372036854775807 -> 9.223372036854776e+18 -> 9223372036854776000) leads to a rounding error and ultimately to a parsing error (json: cannot unmarshal number 9223372036854776000), since the value range for 64Bit Integer is exceeded.

This is done by the double unmarchalling

  1. in ocppj/server.go::ocppMessageHandler - ParseRawJsonMessage(data))
    data ist correct, parsedJson contains transactionId:9.223372036854776e+18 (9223372036854776000)
  2. ParseMessage(parsedJson, pending) -> FormationViolation - json: cannot unmarshal number 9223372036854776000 into Go struct field MeterValuesRequest.transactionId of type int

Certainly, Alfen has to make sure in a newer firmware (currently 6.2.0) that a valid TransactionId is not lost and as a consequence a non-valid TransactionId is not sent as MaxInt 64Bit but omitted, but basically MaxInt 64Bit (9223372036854775807) would be a valid TransactionId according to ocpp1.6j.

The following code works (if type INT is known):

type MeterValuesRequest struct {
	ConnectorId   int  `json:"connectorId" validate:"gte=0"`
	TransactionId *int `json:"transactionId,omitempty"`
}
var m MeterValuesRequest = MeterValuesRequest{}
err = json.Unmarshal(jsonMsg, &m)
if err != nil {
log.Fatal(err)
}
logrus.Infof("msg TransactionId: %d", *m.TransactionId)

The following ultimatively leads to the above-mentioned 'error'

var arr []interface{}
err := json.Unmarshal(dataJson, &arr)
if err != nil {
return nil, err
}

In principle, it would be desirable that valid values according to the spec do not lead to a parsing error. Furthermore, it is somehow unsightly that the error message with implementation details is ultimately sent back to the client (and ends up in the Alfen log).

Due to the error when sending the MeterValues, the StopTransaction is not sent in the Alfen firmware in the further course, so the station is basically unusable. This transaction with an invalid Id can no longer be cancelled from the outside (via OCPP).

regards

https://forum.golangbridge.org/t/type-problem-in-json-conversion/19420

--cut--
This probably goes without saying, but if you know your fields ahead of time, you can always provide type in the struct itself rather than using “interface{}”.
--cut--

Maybe we could find a way to use the original string/message for decoding after the type has been determined instead of arr[3]in occpj.go:399 profile.ParseRequest(action, arr[3], parseRawJsonRequest)?

Is it possible to capture the error from within the central system service? (to generate an email to attract attention to nail down the problem)

level=error msg="ocpp message (40): FormationViolation - json: cannot unmarshal number 9223372036854776000 into Go struct field MeterValuesRequest.transactionId of type int"

The ocpp 1.6 spec is actually really unclear about that, since the primitive type is simply defined as integer. Most languages would assume an integer type to be a 32-bit value and not 64-bit, unless explicitly specified. This is ultimately cleared up in the v2 spec (-> int32, signed), but I'm pretty sure it has caused issues for other implementations before already (see steve-community/steve#280).

The library was actually designed with a 32-bit value in mind. The fact that int in Go has a variable size depending on the underlying architecture is just a "lucky coincidence". Such a transaction ID wouldn't work at all if you deployed this on an ARM32 charge point.

Regardless, I opened a PR with a potential fix for your immediate issue. I also added the max int case within a test to verify this, but feel free to have a look.

I have tested your fix and found, that it still does not work due to the fact that ParseRawJsonRequest would also need to be patched. Nevertheless this seems not to be the right way to go, right? Along with this I found out that most of our production systems run on 32Bit hardware so even if the patch would work, I would not be able to receive that large Integer value to handle appropriately.

I would like to have the possibility within the CentralSystem to catch that kind of an error (that this malformed message arrived) and act accordingly. At least I would like to inform the admin of the system about this inconsistency and best would be to be able to override some kind of a workaround for Alfen Charger. I also want to avoid that this error is sent back to the charger.

So best would be to try to unmarshall the MeterValuesRequest{} and and in case of that error (TransactionId cannot be correctly determined, reset TransactionId to be nilto be spec-conform). Then the application can handle the missing TransactionId if necessary/possible. Same is needed for StopTransactionRequest{} though that would not allow a missing TransactionId by specification... (Alfen also sends StopTransaction with invalid ID and if not properly confirmed the MeterValuesRequest{}/StopTransactionRequest{} will remain in a dead loop...)

Generally it would be great to be informed of incoming erroneous messages to be able to handle them in some generic and/or specific way. In the mentioned case I would like to handle all incoming messages with confirmation to avoid a dead loop at Alfen side.... Since that results in a dead charger after a couple of failed requests. They loose connection to the central system and never connect again. A hard power cycle is needed to get them back up (and that requires someone to be present at the remote charging site...)

Please let me know if I can help, I would be happy to do that.

it still does not work due to the fact that ParseRawJsonRequest would also need to be patched

What do you mean? For a 64-bit system, the incoming value should be parsed correctly (confirmed by the test I added). For 32-bit systems this won't work indeed, but again the value is assumed to be an int32. Other implementations do the same, see ChargeTimeEU.

So best would be to try to unmarshall the MeterValuesRequest{} and and in case of that error (TransactionId cannot be correctly determined, reset TransactionId to be nilto be spec-conform). Then the application can handle the missing TransactionId if necessary/possible.

This sounds cumbersome. The protocol specs state that upon receiving a malformed/incorrect message a respective error shall be returned. The library abstracts this behavior so that implementations don't have to handle every inconsistency introduced by non-conform endpoints. Furthermore I'm not sure how to achieve this, since the built-in json.Unmarshal functionality will simply throw an error, when an invalid data type is found: when this happens, the resulting data structure is empty and it would be very hard to arbitrarily fill only some fields and leave others empty.

Generally it would be great to be informed of incoming erroneous messages to be able to handle them in some generic and/or specific way.

This one makes sense, although it will increase the complexity a bit. We could introduce InvalidRequestHandler callbacks, that get invoked here when a parsing error is encountered. The callback could accept the encountered error and the raw payload. It would be up to the implementation to log/handle these messages internally somehow. Does this make sense?

I would, however, insist on returning an error message, because invalid payloads should return errors according to the specs.

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

it still does not work due to the fact that ParseRawJsonRequest would also need to be patched

What do you mean? For a 64-bit system, the incoming value should be parsed correctly (confirmed by the test I added). For 32-bit systems this won't work indeed, but again the value is assumed to be an int32. Other implementations do the same, see ChargeTimeEU.

So best would be to try to unmarshall the MeterValuesRequest{} and and in case of that error (TransactionId cannot be correctly determined, reset TransactionId to be nilto be spec-conform). Then the application can handle the missing TransactionId if necessary/possible.

This sounds cumbersome. The protocol specs state that upon receiving a malformed/incorrect message a respective error shall be returned. The library abstracts this behavior so that implementations don't have to handle every inconsistency introduced by non-conform endpoints. Furthermore I'm not sure how to achieve this, since the built-in json.Unmarshal functionality will simply throw an error, when an invalid data type is found: when this happens, the resulting data structure is empty and it would be very hard to arbitrarily fill only some fields and leave others empty.

Generally it would be great to be informed of incoming erroneous messages to be able to handle them in some generic and/or specific way.

This one makes sense, although it will increase the complexity a bit. We could introduce InvalidRequestHandler callbacks, that get invoked here when a parsing error is encountered. The callback could accept the encountered error and the raw payload. It would be up to the implementation to log/handle these messages internally somehow. Does this make sense?

Exactly. This is what was intended. That way application can handle (at least log or otherwise inform/alert the user) the malformed messages.

I would, however, insist on returning an error message, because invalid payloads should return errors according to the specs.

Yes, definitely - error should be returned.

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

Reproducing is hard, logfiles can be provided. Are you an Alfen representative? If so we could share open ticket information, if not, let me know how to share logfiles with you.

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

Reproducing is hard, logfiles can be provided. Are you an Alfen representative? If so we could share open ticket information, if not, let me know how to share logfiles with you.

Representative no, employee working on the software that stumbled upon this Github repo, yes. Feel free to share any information you feel comfortable sharing, whether that's an open ticket you have with our after sales department or a log file.

I would like to get in direct contact with you, in order to get this sorted. We have two major issues open with Alfen After Sales. The biggest issue that is very annoying is CST-104472, the second one is CST-103823. Please get in contact with me here. I would like to share more details in direct contact in order to keep this issue clean here.

I would like to get in direct contact with you, in order to get this sorted. We have two major issues open with Alfen After Sales. The biggest issue that is very annoying is CST-104472, the second one is CST-103823. Please get in contact with me here. I would like to share more details in direct contact in order to keep this issue clean here.

Hi @qosmotec, the first ticket (CST-104472) definitely has some strange behavior: assertion failed and not responding to reboot command, that will be escalated to the R&D department, but I can't say anything about when this will be resolved. However the core of that issue seems to be the back office connection issues. How did you verify it's not on the back office side? The station shows errno 103, which is connection aborted. The station doesn't try to abort any connection, so it feels like this is coming from back office side. Especially since in some cases in your log file, after about 12 hours the station is all of a sudden able to connect again.

For CST-103823, a bug report has been made at R&D as well. In some cases, this seems to be triggered by an invalid idTag. To validate that the root cause is the same for you, I would need logging that shows the start of this issue. As for a workaround, you should be able to set TransactionMessageAttempts to 5 or so, then after 5 times of sending it the charging station will give up and skip that message. Same will be done for all messages that the back office responds to with some type of error or invalid reply.

@qosmotec just fyi, I added a hook for basically intercepting invalid incoming messages, see #215

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

@lohralfen I stumbled upon this issue because I encountered invalid transaction identifiers as well. I don't use ocpp-go, but I can share what I have seen that might help you reproduce this issue, if still necessary.

I'm using an Alfen Eve Single Pro, running firmware 5.8 connected to some CPO backend via modem. The meter values update every 10 seconds (for testing purposes). The session started with a regular transaction ID that was a valid 32-bit number.

During the session, this number suddenly changed to 9223372036854775807 (2^63 - 1), and I could not stop the session, unless I physically stopped the charger and removed the cable. Every next session would still use this transaction identifier, until I finally cleared the transaction database (+/- 4100 items were in there). The session ran fine for +/- 2 days, then failed and I eventually stopped it at +/- 5 days (I did not notice it until then).

This makes me believe that this is an overflow situation.

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

@lohralfen I stumbled upon this issue because I encountered invalid transaction identifiers as well. I don't use ocpp-go, but I can share what I have seen that might help you reproduce this issue, if still necessary.

I'm using an Alfen Eve Single Pro, running firmware 5.8 connected to some CPO backend via modem. The meter values update every 10 seconds (for testing purposes). The session started with a regular transaction ID that was a valid 32-bit number.

During the session, this number suddenly changed to 9223372036854775807 (2^63 - 1), and I could not stop the session, unless I physically stopped the charger and removed the cable. Every next session would still use this transaction identifier, until I finally cleared the transaction database (+/- 4100 items were in there). The session ran fine for +/- 2 days, then failed and I eventually stopped it at +/- 5 days (I did not notice it until then).

This makes me believe that this is an overflow situation.

This issue is already fixed - IIRC in firmware version 6.0 support for OCPP 2.0.1 was introduced, and also the bug that caused this was fixed. However you could still have invalid entries in the database of your CS, so in firmware version 6.5 it was fixed so that these invalid entries are not transmitted to the CSMS. Please let me know if you still experience this issue with FW version 6.5 or later.

@lohralfen Unfortunately, I cannot update this charger due to a broken update. It updated the bootloader, but did not update the firmware. Now the firmware is stuck on 5.8 because it does not know the new 6.x bootloader. That's not for this issue to discuss, maybe I can provide you more details directly.

@lohralfen Unfortunately, I cannot update this charger due to a broken update. It updated the bootloader, but did not update the firmware. Now the firmware is stuck on 5.8 because it does not know the new 6.x bootloader. That's not for this issue to discuss, maybe I can provide you more details directly.

Sorry to hear. Such specific cases are best handled by the service portal, because a bricked bootloader will probably require a controller board replacement.

@qosmotec, do you have a way to reproduce the station sending an invalid transaction id? Or perhaps you can still obtain the logging from that time period using the ACE Service Installer?

@lohralfen I stumbled upon this issue because I encountered invalid transaction identifiers as well. I don't use ocpp-go, but I can share what I have seen that might help you reproduce this issue, if still necessary.
I'm using an Alfen Eve Single Pro, running firmware 5.8 connected to some CPO backend via modem. The meter values update every 10 seconds (for testing purposes). The session started with a regular transaction ID that was a valid 32-bit number.
During the session, this number suddenly changed to 9223372036854775807 (2^63 - 1), and I could not stop the session, unless I physically stopped the charger and removed the cable. Every next session would still use this transaction identifier, until I finally cleared the transaction database (+/- 4100 items were in there). The session ran fine for +/- 2 days, then failed and I eventually stopped it at +/- 5 days (I did not notice it until then).
This makes me believe that this is an overflow situation.

This issue is already fixed - IIRC in firmware version 6.0 support for OCPP 2.0.1 was introduced, and also the bug that caused this was fixed. However you could still have invalid entries in the database of your CS, so in firmware version 6.5 it was fixed so that these invalid entries are not transmitted to the CSMS. Please let me know if you still experience this issue with FW version 6.5 or later.

Herewith I confirm the fix with firmware 6.5.0.

I assume that the problem was setting a charge profile of purpose TxProfile when the transaction is not yet started and/or setting a charge profile of purpose ChargePointMaxProfile with no TXId specified.

Since no ID specified (that is conform with spec), charger sets null thus resulting in MaxInt internally? With 6.5.0 the SetChargeprofile purpose TxProfile is rejected when the transaction is not yet started, this seems to prevent the problem.