linux-can/can-doc

Issue with bus.send_periodic

YuBer0 opened this issue ยท 54 comments

YuBer0 commented

Hi, I'm trying to send CAN messages with the function send_periodic. however i got the error

can.exceptions.CanOperationError: Couldn't send CAN BCM frame due to OS Error: Invalid argument You are probably referring to a non-existing frame. [Error Code 22]

The code i used is :

import can
import time

try:
    bus = can.interface.Bus(channel = 'can0',
                            bustype = 'socketcan',
                            bitrate = 500000)
except OSError as e:
    print(e)

message = can.Message(arbitration_id=0x37A, data=[0x0A, 0x00, 0x3B, 0x00, 0xFF, 0x0B, 0x00, 0x00], is_extended_id= False)
message1 = can.Message(arbitration_id=0x379, data=[0x0C, 0x00, 0x0A, 0x00, 0xFF, 0x00, 0x0A, 0x00], is_extended_id= False)
message2 = can.Message(arbitration_id=0x372, data=[0x00, 0xD0, 0x50, 0x80, 0xCC, 0x00, 0xAA, 0xB0], is_extended_id= False)

period = 0.1

while True:
    bus.send_periodic(msgs = message, period = 0.1)
    bus.send_periodic(msgs = message1, period = 0.1)
    bus.send_periodic(msgs = message2, period = 0.1)

When i tried to send it via bus.send it seems to be able to work

import can
import time

bus = can.interface.Bus(channel='can0', bustype='socketcan')
    
message = can.Message(arbitration_id=0x37A, data=[0x0A, 0x00, 0x3B, 0x00, 0xFF, 0x0B, 0x00, 0x00], is_extended_id= False)
message1 = can.Message(arbitration_id=0x379, data=[0x0C, 0x00, 0x0A, 0x00, 0xFF, 0x00, 0x0A, 0x00], is_extended_id= False)
message2 = can.Message(arbitration_id=0x372, data=[0x00, 0xD0, 0x50, 0x80, 0xCC, 0x00, 0xAA, 0xB0], is_extended_id= False)


period = 0.1

while True:
    bus.send(message)
    time.sleep(period)
    bus.send(message1)
    time.sleep(period)
    bus.send(message2)
    time.sleep(period)

Here are some of the configurations that my CAN device is working on
RPI-4B 8GB Ram,
kernel version : 6.1.19-v8+
CAN transceiver device, MCP2515 (modified, changed VP230 for TJA1050)

ip -d -s link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0 
    can state ERROR-ACTIVE restart-ms 100 
    
lsmod | grep spi
spidev                 20480  2
spi_bcm2835            20480  0

 ifconfig can0
can0: flags=193<UP,RUNNING,NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 10  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

/boot/config.txt
dtparam=spi=on
dtoverlay=mcp2515, spi0-0, interrupt=25,oscillator=8000000
dtoverlay=spi-dma
dtoverlay=spi-bcm2835

Upon boot up, RPI did say that it is unable to load spi-dma & bcm2835

failed to load dtoverlay=spi-dma
failed to load dtoverlay=spi-bcm2835

I'm pretty new to RPI and CAN devices as well as posting issues on github, so any advice would definitely help! Thanks

This repository is about datasheets of CAN IP cores, not about general support.

I'm not sure if the BCM is compiled into the raspi Linux kernel by default. Please check if it's loaded, after booting check the kernel log: dmesg | grep -i can.

YuBer0 commented

Hi thank you for the prompt reply,
how should i shift the issue to general support?

as for kernel log: dmesg | grep - i can
the output is

CAN device driver interface
MCP251x spi0.0 can0: MCP2515 successfully initialized.
CAN: controller area network core
NET: Registered PF_CAN protocol family
IPv6: ADDRCONF(NETDEV_CHANGE): can0: link becomes ready

It doesn't show can: broadcast manager protocol, so BCM is not compiled into the kernel. You have to recompile your kernel with CAN_BCM enabled.

I'm sure, you'll find a low of documentation, if you search for "how to compile kernel for raspberry pi".

YuBer0 commented

Hi, i managed to recompile my kernel with CAN_BCM enabled. via resources this link

i check the kernel log: dmesg | grep -i can and the output i get is:

CAN device driver interface
MCP251x spi0.0 can0: MCP2515 successfully initialized.
CAN: controller area network core
NET: Registered PF_CAN protocol family
CAN: raw protocol
IPv6: ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
CAN: broadcast manager protocol

However the same error persist when i try to use the function send_periodic. Also when i reboot the system, it seems like the CAN: broadcast manger protocol is not shown after i typed dmesg | grep -i can in kernel

However the same error persist when i try to use the function send_periodic. Also when i reboot the system, it seems like the CAN: broadcast manger protocol is not shown after i typed dmesg | grep -i can in kernel

If the BCM is a module it's only loaded if needed, so after you run your test program the first time.

Try running as root, if this doesn't work, maybe @hartkopp can help you.

YuBer0 commented

I see. this is helpful.
i managed to see the CAN: broadcast manger protocol is not shown after i typed dmesg | grep -i can in kernel now, however i still face the same error message:

can.exceptions.CanOperationError: Couldn't send CAN BCM frame due to OS Error: Invalid argument You are probably referring to a non-existing frame. [Error Code 22]

I tried the original code from #2 (comment) and only changed can0 to vcan0 . That resulted in a working setup.

But candump any -td shows that the gap between the CAN frames is only about some micro seconds:

 (000.000030)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000045)  vcan0  379   [8]  0C 00 0A 00 FF 00 0A 00
 (000.000036)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000015)  vcan0  372   [8]  00 D0 50 80 CC 00 AA B0
 (000.000040)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000044)  vcan0  379   [8]  0C 00 0A 00 FF 00 0A 00
 (000.000044)  vcan0  372   [8]  00 D0 50 80 CC 00 AA B0

This would be definitely too fast for a real CAN bus.

Additionally this looks wrong:

while True:
    bus.send_periodic(msgs = message, period = 0.1)
    bus.send_periodic(msgs = message1, period = 0.1)
    bus.send_periodic(msgs = message2, period = 0.1)

You are creating a busy loop which continuously overwrites the current CAN_BCM setting to establish a periodic send job!

You likely wanted to have

bus.send_periodic(msgs = message, period = 0.1)
bus.send_periodic(msgs = message1, period = 0.1)
bus.send_periodic(msgs = message2, period = 0.1)

while True:
    time.sleep(1)

which leads to this output of candump any -td :

 (000.100444)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000030)  vcan0  379   [8]  0C 00 0A 00 FF 00 0A 00
 (000.000019)  vcan0  372   [8]  00 D0 50 80 CC 00 AA B0
 (000.100024)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000095)  vcan0  379   [8]  0C 00 0A 00 FF 00 0A 00
 (000.000015)  vcan0  372   [8]  00 D0 50 80 CC 00 AA B0
 (000.100278)  vcan0  37A   [8]  0A 00 3B 00 FF 0B 00 00
 (000.000019)  vcan0  379   [8]  0C 00 0A 00 FF 00 0A 00
 (000.000013)  vcan0  372   [8]  00 D0 50 80 CC 00 AA B0
YuBer0 commented

Hello Hartkopp, thank you for the reply!
I yes you are right regarding the while True loop. i changed the loop portion of the code to what you have suggested, however i still face the same error code.
I'm not too sure do i have to download or install any other libraries in order to use BCM on RPI with a physical CAN device?

Can you please check is it works with a virtual CAN interface in your setup (as I showed above) and check if the CAN traffic is analogue to my candump example?
We have to figure out if it is a CAN driver or BCM problem.

YuBer0 commented

Sure, i just tried with your changes, and it's have the same error message.
I'm curious, as i manage to use can.send, does it mean that would be a chance where the CAN driver has issue?

Can you please post the output of lsmod | grep can and ip -d -s link show vcan0?

YuBer0 commented

Sure!
zzz

lsmodlooks good.

But there are no RX/TX packets on the vcan0 interface.

When I run your program on vcan0instead of can0 it looks like this:

$ ip -d -s link show vcan0
3: vcan0: <NOARP,UP,LOWER_UP> mtu 72 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/can  promiscuity 0  allmulti 0 minmtu 0 maxmtu 0 
    vcan numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 
    RX:  bytes packets errors dropped  missed   mcast           
         29936    3742      0       0       0       0 
    TX:  bytes packets errors dropped carrier collsns           
         29936    3742      0       0       0       0 
YuBer0 commented

I'm unable to run the code, because when i try to run it, the err code 22 came up.
Can i check with you is there any other settings or configuration that i might have missed out on?

Please show the code you're trying to run. This work-for-me:

import can                                                                                                                                                                                                                                                      
import time                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                
try:                                                                                                                                                                                                                                                            
    bus = can.interface.Bus(channel = 'vcan0',                                                                                                                                                                                                                  
                            bustype = 'socketcan',                                                                                                                                                                                                              
                            bitrate = 500000)                                                                                                                                                                                                                   
except OSError as e:                                                                                                                                                                                                                                            
    print(e)                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                
message = can.Message(arbitration_id=0x37A, data=[0x0A, 0x00, 0x3B, 0x00, 0xFF, 0x0B, 0x00, 0x00], is_extended_id= False)                                                                                                                                       
message1 = can.Message(arbitration_id=0x379, data=[0x0C, 0x00, 0x0A, 0x00, 0xFF, 0x00, 0x0A, 0x00], is_extended_id= False)                                                                                                                                      
message2 = can.Message(arbitration_id=0x372, data=[0x00, 0xD0, 0x50, 0x80, 0xCC, 0x00, 0xAA, 0xB0], is_extended_id= False)                                                                                                                                      
                                                                                                                                                                                                                                                                
period = 0.1                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                
bus.send_periodic(msgs = message, period = 0.1)                                                                                                                                                                                                                 
bus.send_periodic(msgs = message1, period = 0.1)                                                                                                                                                                                                                
bus.send_periodic(msgs = message2, period = 0.1)                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                
while True:                                                                                                                                                                                                                                                     
        time.sleep(1)                           
YuBer0 commented

Sure, here it is, with the terminal message
zzz

But if this would have been started, why are there no RX/TX packets for vcan0 visible?

YuBer0 commented

I dont think the code manage to get started due to the error. thus no RX/TX packets are out for vcan0

Can you create a log with strace:

strace -o log python3 test.py

Mine looks like this:

ioctl(3, SIOCGIFINDEX, {ifr_name="vcan0", ifr_ifindex=6}) = 0
bind(3, {sa_family=AF_CAN, sa_data="\225U\6\0\0\0\213\352P\0\0\0\0\0\240\362.\1\0\0\0\0"}, 24) = 0
setsockopt(3, SOL_CAN_RAW, CAN_RAW_FILTER, "\0\0\0\0\0\0\0\0", 8) = 0
socket(AF_CAN, SOCK_DGRAM|SOCK_CLOEXEC, CAN_BCM) = 4
ioctl(4, SIOCGIFINDEX, {ifr_name="vcan0", ifr_ifindex=6}) = 0
connect(4, {sa_family=AF_CAN, sa_data="\225U\6\0\0\0\213\352P\0\0\0\0\0 Jq\1\0\0\0\0"}, 24) = 0
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=413541, tv_nsec=40898648}, NULL) = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=413542, tv_nsec=40984424}, NULL) = 0

Interestingly there are EINVAL errors, too, but my python seems to retry with a larger length.

Which python-can version are you using? Try apt-cache policy python3-can.

BTW: please copy/paste from your terminal, no need for screen shots.

YuBer0 commented

Sure,
for strace -o log python3 test.py,

strace -o log python3 /home/Test_code/zzzz.py
Traceback (most recent call last):
  File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 280, in send_bcm
    return bcm_socket.send(data)
OSError: [Errno 22] Invalid argument

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/Test_code/zzzz.py", line 15, in <module>
    bus.send_periodic(msgs = message, period = 0.1)
  File "/home/.local/lib/python3.9/site-packages/can/bus.py", line 242, in send_periodic
    self._send_periodic_internal(msgs, period, duration),
  File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 838, in _send_periodic_internal
    task = CyclicSendTask(bcm_socket, task_id, msgs, period, duration)
  File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 350, in __init__
    self._tx_setup(self.messages)
  File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 377, in _tx_setup
    send_bcm(self.bcm_socket, header + body)
  File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 293, in send_bcm
    raise can.CanOperationError(base + specific_message, error.errno) from error
can.exceptions.CanOperationError: Couldn't send CAN BCM frame due to OS Error: Invalid argument You are probably referring to a non-existing frame. [Error Code 22]

as for python-can when i use : apt-cache policy python3-can

python3-can:
  Installed: (none)
  Candidate: 3.3.2.final~github-2
  Version table:
     3.3.2.final~github-2 500
        500 http://raspbian.raspberrypi.org/raspbian bullseye/main armhf Packages

but when i use pip show python-can

Name: python-can
Version: 4.1.0
Summary: Controller Area Network interface module for Python
Home-page: https://github.com/hardbyte/python-can
Author: python-can contributors
Author-email: None
License: LGPL v3
ioctl(3, SIOCGIFINDEX, {ifr_name="vcan0", ifr_ifindex=6}) = 0
bind(3, {sa_family=AF_CAN, sa_data="\225U\6\0\0\0\213\352P\0\0\0\0\0\240\362.\1\0\0\0\0"}, 24) = 0
setsockopt(3, SOL_CAN_RAW, CAN_RAW_FILTER, "\0\0\0\0\0\0\0\0", 8) = 0
socket(AF_CAN, SOCK_DGRAM|SOCK_CLOEXEC, CAN_BCM) = 4
ioctl(4, SIOCGIFINDEX, {ifr_name="vcan0", ifr_ifindex=6}) = 0
connect(4, {sa_family=AF_CAN, sa_data="\225U\6\0\0\0\213\352P\0\0\0\0\0 Jq\1\0\0\0\0"}, 24) = 0
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
sendto(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, 0) = -1 EINVAL (Invalid argument)
sendto(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=413541, tv_nsec=40898648}, NULL) = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=413542, tv_nsec=40984424}, NULL) = 0

Interestingly there are EINVAL errors, too, but my python seems to retry with a larger length.

python-can obviously does a TX_READ (0x03) first and then does a TX_SETUP (0x01).
Don't know why. But when you read a non-existing element (in bcm_read_op() in bcm.c), you get -EINVAL ... which is correct.

Sure, for strace -o log python3 test.py, strace -o log python3 /home/continental/Test_code/zzzz.py

The strace log has to look like the example from @marckleinebudde

YuBer0 commented

How can i go about it? Since we both use the same code, could the difference in python library versions or OS that we use affect it?

Please send a strace output as posted by @marckleinebudde here #2 (comment)

YuBer0 commented

I'm not too sure if this is the output that you are looking for, as the strace output log is quite long.
I have attached the log
log.txt
If the output is via the terminal it can be seen from this reply

Sure, for strace -o log python3 test.py, strace -o log python3 /home/Test_code/zzzz.py

Traceback (most recent call last):
File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 280, in send_bcm
return bcm_socket.send(data)
OSError: [Errno 22] Invalid argument
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/Test_code/zzzz.py", line 15, in
bus.send_periodic(msgs = message, period = 0.1)
File "/home/.local/lib/python3.9/site-packages/can/bus.py", line 242, in send_periodic
self._send_periodic_internal(msgs, period, duration),
File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 838, in _send_periodic_internal
task = CyclicSendTask(bcm_socket, task_id, msgs, period, duration)
File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 350, in init
self._tx_setup(self.messages)
File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 377, in _tx_setup
send_bcm(self.bcm_socket, header + body)
File "/home/.local/lib/python3.9/site-packages/can/interfaces/socketcan/socketcan.py", line 293, in send_bcm
raise can.CanOperationError(base + specific_message, error.errno) from error
can.exceptions.CanOperationError: Couldn't send CAN BCM frame due to OS Error: Invalid argument You are probably referring to a non-existing frame. [Error Code 22]

as for python-can when i use : apt-cache policy python3-can

python3-can:
Installed: (none)
Candidate: 3.3.2.finalgithub-2
Version table:
3.3.2.final
github-2 500
500 http://raspbian.raspberrypi.org/raspbian bullseye/main armhf Packages

but when i use pip show python-can

Name: python-can
Version: 4.1.0
Summary: Controller Area Network interface module for Python
Home-page: https://github.com/hardbyte/python-can
Author: python-can contributors
Author-email: None
License: LGPL v3

The interesting part is here:

socket(AF_CAN, SOCK_RAW|SOCK_CLOEXEC, CAN_RAW) = 3
setsockopt(3, SOL_CAN_RAW, 3, [1], 4)   = 0
setsockopt(3, SOL_CAN_RAW, 4, [0], 4)   = 0
setsockopt(3, SOL_CAN_RAW, 2, [536870911], 4) = 0
setsockopt(3, SOL_SOCKET, SO_TIMESTAMPNS_OLD, [1], 4) = 0
ioctl(3, SIOCGIFINDEX, {ifr_name="vcan0", }) = 0
bind(3, {sa_family=AF_CAN, sa_data="\24\367\5\0\0\0\0\0\0\0\224\354G\0\224\271\"\367\2\0\0\0"}, 24) = 0
setsockopt(3, SOL_CAN_RAW, 1, "\0\0\0\0\0\0\0\0", 8) = 0
socket(AF_CAN, SOCK_DGRAM|SOCK_CLOEXEC, CAN_BCM) = 4
ioctl(4, SIOCGIFINDEX, {ifr_name="vcan0", }) = 0
connect(4, {sa_family=AF_CAN, sa_data="-\367\5\0\0\0\314\276I\0\0\0\0\0\34\317\7\0\350^\v\0"}, 24) = 0
send(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0"..., 40, 0) = -1 EINVAL (Invalid argument)
send(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\240\206\1\0\1\0\0\0"..., 56, 0) = -1 EINVAL (Invalid argument)

This is the relevant part of the strace log:

socket(AF_CAN, SOCK_DGRAM|SOCK_CLOEXEC, CAN_BCM) = 4
ioctl(4, SIOCGIFINDEX, {ifr_name="vcan0", }) = 0
connect(4, {sa_family=AF_CAN, sa_data="-\367\5\0\0\0\314\276I\0\0\0\0\0\34\317\7\0\350^\v\0"}, 24) = 0
send(4, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0"..., 40, 0) = -1 EINVAL (Invalid argument)
send(4, "\1\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\240\206\1\0\1\0\0\0"..., 56, 0) = -1 EINVAL (Invalid argument)

As struct bcm_msg_head is 56 bytes (as you can see in Marc's log), I wonder why your setup sends 40/56 bytes instead of 56/72 ...

As struct bcm_msg_head is 56 bytes (as you can see in Marc's log), I wonder why your setup sends 40/56 bytes instead of 56/72 ...

It's a 32 bit user space....

@YuBer0 What's the output of uname -a

YuBer0 commented

Linux pi 6.1.19-v8+ #1637 SMP PREEMPT Tue Mar 14 11:11:47 GMT 2023 aarch64 GNU/Linux

my pahole says:

struct bcm_msg_head {
        __u32                      opcode;               /*     0     4 */
        __u32                      flags;                /*     4     4 */
        __u32                      count;                /*     8     4 */

        /* XXX 4 bytes hole, try to pack */

        struct bcm_timeval         ival1;                /*    16    16 */
        struct bcm_timeval         ival2;                /*    32    16 */
        canid_t                    can_id;               /*    48     4 */
        __u32                      nframes;              /*    52     4 */
        struct can_frame           frames[];             /*    56     0 */

        /* size: 56, cachelines: 1, members: 8 */
        /* sum members: 52, holes: 1, sum holes: 4 */
        /* last cacheline: 56 bytes */
};

uname -a
Linux box 6.4.0-rc2 #2 SMP PREEMPT_DYNAMIC Wed May 17 17:12:02 CEST 2023 x86_64 GNU/Linux

@hartkopp 64 bit kernel with a 32 bit user space. Another issue due to MSG_CMSG_COMPAT?

Hm, might be.
Arnd add this patch, where I hoped it would fix the things up ...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba61a8d9d7809

A 32 bit ARM kernel has the following layout:
See difference in struct bcm_timeval

struct bcm_msg_head {                                                                                                                                                                                                                                           
        __u32                      opcode;               /*     0     4 */                                                                                                                                                                                      
        __u32                      flags;                /*     4     4 */                                                                                                                                                                                      
        __u32                      count;                /*     8     4 */                                                                                                                                                                                      
        struct bcm_timeval         ival1;                /*    12     8 */                                                                                                                                                                                      
        struct bcm_timeval         ival2;                /*    20     8 */                                                                                                                                                                                      
        canid_t                    can_id;               /*    28     4 */                                                                                                                                                                                      
        __u32                      nframes;              /*    32     4 */                                                                                                                                                                                      
                                                                                                                                                                                                                                                                
        /* XXX 4 bytes hole, try to pack */                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                
        struct can_frame           frames[] __attribute__((__aligned__(8))); /*    40     0 */                                                                                                                                                                  
                                                                                                                                                                                                                                                                
        /* size: 40, cachelines: 1, members: 8 */                                                                                                                                                                                                               
        /* sum members: 36, holes: 1, sum holes: 4 */                                                                                                                                                                                                           
        /* forced alignments: 1, forced holes: 1, sum forced holes: 4 */                                                                                                                                                                                        
        /* last cacheline: 40 bytes */                                                                                                                                                                                                                          
} __attribute__((__aligned__(8)));     
struct bcm_timeval {
	long tv_sec;
	long tv_usec;
};

and long is 32 bit on 32 bit ARM.

Hm, might be.
Arnd add this patch, where I hoped it would fix the things up ...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba61a8d9d7809

Seems it's time to pick up that old thread.

It seems we have to evaluate if MSG_CMSG_COMPAT is set and then treat the received message differently.

@YuBer0 the instant fix for you would be to compile your Kernel as 32 bit Kernel as you have a 32 bit user space on your system.

@hartkopp something like in: https://elixir.bootlin.com/linux/v6.3/source/net/bluetooth/hci_sock.c#L1422

@marckleinebudde I'm not sure if this helps as the COMPAT stuff is intended for CMSG handling.

But I would be fine to add some analogue code if it doesn't make the situation worse ;-D

YuBer0 commented

@YuBer0 the instant fix for you would be to compile your Kernel as 32 bit Kernel as you have a 32 bit user space on your system.

awesome!

@hartkopp @marckleinebudde
thank you so much for working on this. really appreciate it!

I have some prototype code, will test tomorrow. gn8

YuBer0 commented

Hey! just tested it out, it's working out perfectly. just wondering do i have to mark this as closed?
cause i noted that you guys would want to work on the library

Please leave this open, I want to try to fix 32 bit userspace on 64 bit kernels.

It doesn't work, as send() gets no compat flag... ๐Ÿ˜ž

For the send() path we can check the length and the consistency and can probably decide for one or the other. For the recv() path we're out of luck. An ioctl() interface would have been better, as there is a dedicated compat_ioctl() callback in the struct proto_ops.

Thanks for the investigation!

The question is if it helps to introduce a compat_ioctl() when nobody knows and cares about it.
This issue was the first feedback on this potential problem after years - and to me it was some kind of an accident when compiling a 64 bit kernel on a 32 bit system and userland.

The good thing about your investigation:
There is always a send or sendmsg or sendto before something can be received on that socket.
So we could check for the bcm_msg_head size and then switch the entire socket (session) to 32/64 bit.

ps. I still wonder if it would be worth the effort or if we better add some documentation to describe this potential problem ...

Filedescriptors can be passed from one process to another :) So it's not a 100% solution.

I still wonder if it would be worth the effort or if we better add some documentation to describe this potential problem ...

Adding documentation is always good, but this is not a potential problem, this is a very real problem on 32 bit userspace on 64 bit kernels.

Filedescriptors can be passed from one process to another :) So it's not a 100% solution.

But then you would need to pass it to another process that has a different 32/64 architecture - is this a valid problem?

I still wonder if it would be worth the effort or if we better add some documentation to describe this potential problem ...

Adding documentation is always good, but this is not a potential problem, this is a very real problem on 32 bit userspace on 64 bit kernels.

Is it? When I get a RasPi or Debian OS image, then I get a consistent kernel with a consistent user land. And when I install additional packages or compile new stuff they share the identical word size.

So it would just lead to problems, if someone copies binaries from a different user land installation, right?

YuBer0 commented

My RasPi was download from the RasPi imager, orginally. although i chose the 32bit OS system. However some how i ended up with a 64 bit kernels.
image

I assume that you built a 64 bit Kernel when following your referenced process here: #2 (comment)

We have a $CUSTOMER using a 32 bit user land on a 64 bit kernel (Though they are not using CAN, bus 100G Ethernet). It's a read world use case!

Hm, wasn't aware of such use-case.
I will take a look on how to handle auto-detecting 32/64 bit sized bcm_msg_head structures.

The if (msg->msg_flags & MSG_CMSG_COMPAT) must be replaced by some function that copies the compat header and checks its integrity.