Figure53/F53OSC

chinese characters in string arguments not parsed properly parsed

Closed this issue · 8 comments

Hello

When I send (from qlab) the following OSC message:

/zoom/username/rename "MAIL ROOM 郵政" "Philip"

the message parsed only sees the first argument:

2021-02-25 21:14:25.904303+0400 ZoomOSC[85681:2610642] Incoming OSC message:
2021-02-25 21:14:25.904330+0400 ZoomOSC[85681:2610642] /zoom/username/rename
2021-02-25 21:14:25.904343+0400 ZoomOSC[85681:2610642] arguments:
2021-02-25 21:14:25.904361+0400 ZoomOSC[85681:2610642] string: "MAIL ROOM 郵政"
2021-02-25 21:14:25.904374+0400 ZoomOSC[85681:2610642] string: ""

If I strip off the final Chinese character, so changing the OSC message to:
/zoom/username/rename "MAIL ROOM 郵" "Philip"

The message is seen as invalid by ZoomOSC:
2021-02-25 21:16:01.562494+0400 ZoomOSC[85681:2610642] Incoming OSC message:
2021-02-25 21:16:01.562523+0400 ZoomOSC[85681:2610642] /zoom/username/rename
2021-02-25 21:16:01.562535+0400 ZoomOSC[85681:2610642] arguments:
2021-02-25 21:16:01.562550+0400 ZoomOSC[85681:2610642] string: "MAIL ROOM 郵"
2021-02-25 21:16:01.562559+0400 ZoomOSC[85681:2610642] Error: Unable to parse string argument for OSC method /zoom/username/rename

I haven't dug into the F53 code yet but will do

OSC itself does not support non-ASCII characters, so this would be a bit of an uphill battle.

Hi Richard -

As @samkusnetz observed, I would also not expect non-ASCII characters to behave correctly across all devices since the OSC spec officially supports ASCII only.

In practice, neither F53OSC nor QLab necessarily filter out non-ASCII characters, so I was curious to verify how they do get handled. After some testing, I am fairly confident that the parsing error you are seeing with ZoomOSC is not occurring in either F53OSC or QLab.

+[F53OSCMessage messageWithString:] is what converts a string representation of a message into an outgoing OSC message. Using the source string /zoom/username/rename "MAIL ROOM 郵政" "Philip", I found that + messageWithString: does create a message with two arguments as expected, with string values "MAIL ROOM 郵政" and "Philip".

In QLab 4, we can test this in a different way using the /setLight OSC method, which happens to take 2 string arguments. In workspace "A", I created a Network cue with message /workspace/B/cue/1/setLight "MAIL ROOM 郵政" "home". In workspace "B", I created a single light instrument named "MAIL ROOM 郵政" and added a Light cue with the command "MAIL ROOM 郵政 = 50". When I ran the Network cue, the Workspace Status window for workspace B logged the incoming message as expected with two arguments, and the message updated the command in Light cue 1 to the "home" value also as expected.

Screen Shot 2021-02-26 at 10 06 38 AM

@richardwilliamson - If you discover new information, please feel free to reopen this issue!

Thanks Brent - interesting to hear that this seems to work in qlab. I have tested my end and am seeing the same behaviour you mention, and it also works if I change the name to "MAIL ROOM 郵"

However, working with the current master branch of the F53OSC library I am still seeing that this message is not being parsed as expected - with logging enabled I am still seeing that the first message looses the contents of its second argument, while the second message (with one chinese character removed from the first argument) still fails to parse the osc message properly.

I assume that you have a newer branch of the library that you are using in Qlab? if not I'm very confused.

(I don't think I am able to re-open this issue but hopefully you will see the reply anyway!)

Hi Richard --

My mistake! I realized the flaw in my QLab /setLight test I did earlier today... My Network patch was sending to the default "localhost", which makes QLab bypass the network stack in the interest of speed. This obscured the issue for me, because QLab handles the F53OSCMessage arguments array directly without ever being encoded into a data stream. Now if I send a message to the F53OSCMonitor app, I can reproduce what you are seeing and trace it more succinctly.

I think the issue is on the unpacking side, in F53OSCParser as it reassembles a message from the socket data. Possibly that the decoding of the non-ASCII characters Is causing the buffer to become out of alignment (or maybe prematurely terminate?) and result in an incomplete message, roughly in or around here: https://github.com/Figure53/F53OSC/blob/1.1.0/Sources/F53OSC/F53OSCParser.m#L192

We will give this some thought. This being somewhat outside of the OSC spec, I can't say whether or not it's something we will decide to address directly or not. But I will leave this issue open in case a solution presents itself. Thanks for bringing this to our attention!

So I think this PR solves the issue. In the original code the library was finding the first occurance of a 0 in the buffer, but rather than using this as the length of the string, rather it was using the length of the string resulting from "stringWithUTF8String" - which I don't think is very reliable.

While this solution still risks non-ascii characters not rendering properly, it does make the library fail a little more gracefully I think.

Thanks

Richard