How does your plugin handle non-ASCII content?

Question

How does your plugin handle non-ASCII content?

Closed this issue 7 months ago · 3 comments

As you might guess from my Avatar I code for another Mud client and we have had a request to support MCP. We support a range of Server encodings as well as ASCII and UTF-8. However from reading the specs for MCP 2.1 at https://www.moo.mud.org/mcp/mcp2.html it is clear that it is an ASCII only protocol. So how do you handle non-ASCII content?

Answer 1 · 2022-12-21T15:54:33.000Z

It should be safe to just send whatever you get. The MOO server will ignore characters it doesn't support (which, at the moment, is anything isgraph() rejects).

That will also make it easy when I add UTF-8 support to the server, as clients will 'just work'. Which is always nice.

Answer 2 · 2022-12-22T13:33:37.000Z

🤔 Are you really sure about that - most sensible encodings do use the same character encodings for the characters in the ASCII range (except EBDIC and one or two graphemes in some Far-Eastern encodings e.g. the Yen '¥' where the '\' would be and an overline symbol '‾' where the '~' would be in Shift-JIS) - but any byte with the 8th bit set is going to be interpreted in a Server encoding dependent manner. Whilst you might design your plug-in to work with UTF-8 (and to be honest that would be a sensible choice) it is by no means certain that the Server will be using that encoding.

OTOH The handling of that is down to the Server and the Client but it is not clear to me that Servers are necessarily going to be set up to talk anything other than ASCII (which is what the baseline Telnet Network Virtual Terminal MUST - for RFC values of MUST support).

Answer 3 · 2022-12-22T18:01:58.000Z

Sorry, I don't know what to tell you. I'm not an expert in character encoding or computer science, I'm just a hobbyist in a US bubble.

All I know is that the LambdaMOO server doesn't explicitly set a locale, so the C library functions (at least in GNU/Linux systems) should default to the C / POSIX locale, which is limited to the 7-bit US-ASCII character set. The server doesn't adhere to the telnet protocol or conventions, so it's not going to negotiate a character encoding with your client. All it does is read data from the network into a char buffer. When that buffer is filled, it goes byte by byte and checks with the C isgraph() function, if it's space, if it's a tab, or if it's a newline. Any other input is unceremoniously ignored. When it receives a newline, it passes whatever has been received since the last newline straight into the command parser. (https://github.com/wrog/lambdamoo/blob/master/net_multi.c#L259)