NeuronRobotics/nrjavaserial

Buffer Overflows in Native Serial Implementation with Long Device Names

Closed this issue ยท 7 comments

I am currently troubeshooting an issue that occurs in nrjavaserial-5.2.1 when connecting to a certain serial port on an openHAB system. For reference, there is also a thread in the openHAB community about this, but so far no one had an idea about how it could be fixed. So I hoped maybe you guys could help me identify and fix the issue.

Platform information: Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-122-generic x86_64)

Background: I use two different serial devices connected via USB. My system sometimes changes the assignments of the USB ports to tty devices, i.e. sometimes a certain device will be mounted at /dev/ttyUSB0 and sometimes at /dev/ttyUSB1. The assignments can change upon every reboot and the behavior seems to be non-deterministic. Luckily there are some symlinks with stable names located at /dev/serial/by-id:

grafik

I am trying to use those symlinks for my openHAB devices. The issue is that one of those symlinks can be used without any problems, while the other crashes the complete JVM when connected to.

This one works without any issues:

/dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0

while this one crashes the JVM:

 /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0

I debugged the issue remotely and found out that the crash happens when calling the native open() method in class gnu.io.RXTXPort. The call is at line 173:

fd = open( name );

the native method called is:

private native synchronized int open( String name ) throws PortInUseException;

First I suspected a threading issue, but even when deactivating the NanoCUL device the issue still occurs, so concurrency does not seem to be the problem.

Do you guys have any other idea why this could happen? The only reason I can think of right now is the length of the device names. The NanoCUL device name has 50 characters, while the other one (it is a Zigbee USB stick) has 66 characters. Are you aware of any limitations regarding the device name length? Or do you have any other ideas why the crash might occur?

Here are the two stack traces of the calls:

From the openHAB Serial Binding with device name /dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0 (this call works):

Thread [OH-safeCall-1] (Suspended (breakpoint at line 173 in RXTXPort))	
	RXTXPort.<init>(String) line: 173	
	RXTXCommDriver.getCommPort(String, int) line: 983	
	CommPortIdentifier.open(String, int) line: 467	
	SerialPortIdentifierImpl.open(String, int) line: 53	
	SerialBridgeHandler.initialize() line: 138	
	GeneratedMethodAccessor91.invoke(Object, Object[]) line: not available	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43	
	Method.invoke(Object, Object...) line: 566	
	InvocationHandlerSync<T>(AbstractInvocationHandler<T>).invokeDirect(Invocation) line: 154	
	Invocation.call() line: 52	
	FutureTask<V>.run() line: 264	
	QueueingThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128	
	ThreadPoolExecutor$Worker.run() line: 628	
	Thread.run() line: 829

From the Zigbee binding with device name /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0 (this one crashes the JVM):

Daemon Thread [OH-thingHandler-5] (Suspended (breakpoint at line 173 in RXTXPort))	
	owns: EmberHandler  (id=705)	
	RXTXPort.<init>(String) line: 173	
	RXTXCommDriver.getCommPort(String, int) line: 983	
	CommPortIdentifier.open(String, int) line: 467	
	SerialPortIdentifierImpl.open(String, int) line: 53	
	ZigBeeSerialPort.open(int, ZigBeePort$FlowControl) line: 157	
	ZigBeeSerialPort.open() line: 123	
	ZigBeeDongleEzsp.initialiseEzspProtocol() line: 1289	
	ZigBeeDongleEzsp.initialize() line: 432	
	ZigBeeNetworkManager.initialize() line: 418	
	EmberHandler(ZigBeeCoordinatorHandler).initialiseZigBee() line: 431	
	EmberHandler(ZigBeeCoordinatorHandler).lambda$2() line: 557	
	354614459.run() line: not available	
	Executors$RunnableAdapter<T>.call() line: 515	
	ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(FutureTask<V>).run() line: 264	
	ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 304	
	WrappedScheduledExecutorService(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128	
	ThreadPoolExecutor$Worker.run() line: 628	
	Thread.run() line: 829

I could not find any JVM crash logs or core dumps. Please let me know if I can provide any further information.

I would be happy to contribute a fix if the issue turns out to be in the Java code (as opposed to the native code).

There are indeed several unbounded sprintf writes to a logging message buffer of 80 bytes containing the device name: https://github.com/NeuronRobotics/nrjavaserial/blob/master/src/main/c/src/SerialImp.c#L669 and following.

This could be the issue indeed. While looking through the C code I found the following issues:

  • RXTXPort.Initialize: buffer message[80] declared in line 265 seems never to be written anywhere but is passed to report() in line 311
  • RXTXPort.open: buffer message[80] declared in line 669 will overflow for long file names like in my case, effective messages below
  • RXTXPort.open: buffer is filled and never used in line 710 (this most likely causes my crash)
  • RXTXCommDriver.testRead (test code, not sure if relevant): buffer message[80] in line 4378 will overflow for long file names (example below)
  • lib_lock_dev_lock: buffer message[80] declared in line 5391 will overflow for long file names (example below)
  • uucp_lock: empty message buffer is reported in line 5551
  • uucp_lock: not sure whether filename was intended instead of name in line 5566. Anyways if my device name would be inserted the buffer would overflow (example below)
  • uucp_lock: not sure if the lock file name also includes the original device name, but if so there might be more overflows in line 5579, 5588 and 5596.
  • uucp_unlock: potential overflows if the lock file name contains the original device file name

Effective error messages that cause buffer overflows when using my device name as example:

open: locking has failed for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (95 characters)
open: locking worked for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (91 characters)
open: exclusive access denied for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (100 characters)
testRead: /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0 is good!\n (85 characters)
RXTX fhs_lock() Error: creating lock file for: /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0: %s\n (117+ characters)
uucp_lock: device was /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (88 characters)

I could try to provide a pull request, but I have to declare that while I'm an experienced Java programmer, I don't know much about memory management in C. So I don't know whether just raising the buffer sizes would solve the problem, plus I don't know how to test this properly. Would anyone else who is more familiar with the code and knows how this can be tested be willing to fix this?

Thanks for the research. Most points should be very easy to fix by replacing sprintf by snprintf and adding sizeof(message) as the additional second argument. The risk of breaking anything by these changes should be low.

Printing an uninitialized buffer should simply be removed I guess.

Created PR #230, it would be great if you could review and test it as I don't have the required toolchains installed.

Nice, thanks for merging this ๐Ÿ‘

@MrDOS do you know whether a nrjavaserial release is planned eventually?

@fwolter do you know the steps required to get a new nrjavaserial version into openHAB and can I assist in any way?

wborn commented

Thanks for fixing these bugs @david-pace!

I'd be happy to integrate a new nrjavaserial version in openHAB when it's released. ๐Ÿ™‚

It would also allow us to use an official release again, since we've been using our own "5.2.1.OH1" build to be able to use the fix for the annoying #111 bug ๐Ÿ› ๐Ÿ”จ

So all changes up to 7aa21d1 are well tested since they are part of OH 3.3.0 which was released ~6 months ago.

Would it possible to create a new stable release of nrjavaserial @madhephaestus @crea-doo @kaikreuzer @MrDOS, and can we do anything to help?