Netlink (neli) code for the CAN interfaces
fpagliughi opened this issue · 18 comments
It would be great to have full coverage of the Netlink API to interact with the CAN interfaces. Or even something better than what is already there. I don't have any experience with the Netlink CAN interfaces, so if anyone has some existing code to get it started, that would be helpful.
Personally, I have never used it directly. Of course, I have used iptools to create and manipulate CAN interfaces, but never used the netlink API directly. Would be willing to investigate though, given some free time. Just can't give any authorative answers.
The (only) must-have use cases based on my work:
- create/delete
- bring up/down
- set baud rate
- enable FD/set secondary baud rate
Reading through the socketcan docs once again, it does support quite a lot of stuff which would certainly be useful, though.
Thanks, @jreppnow . Setting the baud rate and secondary (FD) baud rate would be a great next step.
The code to bring the interface up and down is in there, and to my knowledge works. It's probably a good example to get started on the rest.
If you were to get started on it, let us know, so that the work is not duplicated.
I'll have some time to look at this in detail over the next weekend, although I'll presumably have to do quite a bit of research first. If that's fine in terms of delay, I'd gladly take a swing.
Sounds like a plan. Over the weekend I will get back to looking at timestamps.
So I started working on the bitrate stuff in order to use it as an example for a guide, but it's pretty hairy tbh. The main issue is that a lot of constants and structs are missing and it's overall very fiddly. You can see my current progress here: https://github.com/jreppnow/socketcan-rs/tree/netlink-bitrate
I still going to go ahead and describe my basic approach so that others can follow through. Recommended reading: https://man7.org/linux/man-pages/man7/netlink.7.html and specifically https://man7.org/linux/man-pages/man7/rtnetlink.7.html.
For cross-referencing, I recommend looking at the iproute2 source code at https://github.com/shemminger/iproute2.
Important things to note
Short collection of things that can trip you up badly.
- Interface name length is limited to 16 characters on Linux.
- All numbers used in netlink/netlink route are NATIVE endianness, not NETWORK endianness.
- If you are like me and you are wondering why the Rtattr types in
neli
get serialized asc_ushort
s, even though they arec_uint
s in the kernel headers - the actual rtnetlink attribute structrtattr
(https://man7.org/linux/man-pages/man7/rtnetlink.7.html) only has anunsigned short
type field, and it determines serialization in the protocol.. As to why the kernel header has explicit// u32
comments behind some of their enum variants - I honestly don't know.
Basic concepts
- we are using Netlink sockets to talk to the routing subservice of the Linux kernel, which also handles configuration of network devices
- the message we exchange always have the same format:
- Netlink header (in
neli
, this isNlmsghdr
) - Netlink route header (
Iinfomsg
) - Netlink route attributes
- In general, these are added by
addattr_l(...)
and similar calls in the C code. They are put into theattributes
field ofIinfomsg
inneli
. - They can be nested (possibly multiple times)! In
iproute2
, you will see something likeaddattr_nest(&req.n, sizeof(req), iflatype)
the code, which opens a nesting scope, andaddattr_nest_end(...)
, which ends a nesting scope. This is modeled inneli
via theadd_nested_attribute(...)
method on theRtattr
struct. The data content of theRtattr
that accepts such nested attributes should be Vec resp. theneli::Buffer
wrapper.
- In general, these are added by
- Netlink header (in
let info = Ifinfomsg::new(
RtAddrFamily::Unspecified,
Arphrd::Netrom,
index.unwrap_or(0) as c_int,
IffFlags::empty(),
IffFlags::empty(), // The documentation says this should always be 0xFF..FF, but that does not work!
{
let mut buffer = RtBuffer::new();
/// Adding an attribute.
buffer.push(Rtattr::new(None, Ifla::Ifname, name)?);
/// Adding an attribute with nested attributes inside.
let mut linkinfo = Rtattr::new(None, Ifla::Linkinfo, Vec::<u8>::new())?;
/// Adding the nested attribute itself.
linkinfo.add_nested_attribute(&Rtattr::new(None, IflaInfo::Kind, kind)?)?;
buffer.push(linkinfo);
buffer
},
);
Concrete example
As mentioned before, I am trying to get bitrate setting to work at the moment. Here is how I approach the problem:
- Figure out what the correct
iproute2
command is, in this case it'sA good source for these is https://www.kernel.org/doc/Documentation/networking/can.txt.ip link set <dev name> type can bitrate <bitrate> [sample-point <sample point>]
- Use
strace
to figure out what the command is actually doing:sudo strace ip link set <dev name> type can bitrate <bitrate> [sample-point <sample point>]
- This gives us (among other things):
Notably, CAN-specific data is not decoded, which brings us to the next step:sendmsg(3, { msg_name = { sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000 }, msg_namelen=12, msg_iov = [ { iov_base= [ { // This is the netlink header (Nlmsghdr) nlmsg_len=84, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=1671295701, nlmsg_pid=0 }, { // This is the netlink route header (Iinfomsg) ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("vcan0"), ifi_flags=0, ifi_change=0 }, [ // Attributes start here { // Parent for the following nested attributes nla_len=52, nla_type=IFLA_LINKINFO }, [ [ { // First nested attribute, this is just a (c) string nla_len=7, nla_type=IFLA_INFO_KIND }, "can"... ], [ { // Second nested attribute, this is actually another parent attribute, i.e. it has further children nla_len=40, nla_type=IFLA_INFO_DATA }, "\x24\x00\x01\x00\xb8\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"... // See below ] ] ] ], iov_len=84 } ], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 84
- Look around in
iproute2/blob/main/ip/iplink_can.c
to see what's actually being done. In this case:What we see here is that another attribute is added (the nesting attribute IFLA_INFO_DATA has already been added), which is the bytes of the c struct can_bittiming (static int can_parse_opt(struct link_util *lu, int argc, char **argv, struct nlmsghdr *n) { struct can_bittiming bt = {}, dbt = {}; // snip .. while (argc > 0) { if (matches(*argv, "bitrate") == 0) { NEXT_ARG(); if (get_u32(&bt.bitrate, *argv, 0)) invarg("invalid \"bitrate\" value\n", *argv); } else if (matches(*argv, "sample-point") == 0) { // snip .. } argc--, argv++; } if (bt.bitrate || bt.tq) addattr_l(n, 1024, IFLA_CAN_BITTIMING, &bt, sizeof(bt)); // snip .. }
linux/can/netlink.h
) with the bit timing set accordingly.
Unfortunately, this struct and the constants used hereIFLA_CAN_BITTIMING
are available neither inlibc
nor inneli
, so we have to add them manually now. What we get is something like this:What I am stuck on right now is that IflaCan does not implement the required traits forlet info = Ifinfomsg::new( RtAddrFamily::Unspecified, Arphrd::Netrom, self.if_index as c_int, IffFlags::empty(), IffFlags::empty(), { let mut buffer = RtBuffer::new(); let mut link_info = Rtattr::new(None, Ifla::Linkinfo, Buffer::new())?; link_info.add_nested_attribute(&Rtattr::new(None, IflaInfo::Kind, "can")?)?; let mut data = Rtattr::new(None, IflaInfo::Data, Buffer::new())?; let timing = can_bittiming { bitrate, sample_point: sample_point.unwrap_or(0) as u32, tq: 0, prop_seg: 0, phase_seg1: 0, phase_seg2: 0, sjw: 0, brp: 0, }; data.add_nested_attribute(&Rtattr::new(None, rt::IflaCan::BitTiming, unsafe { std::slice::from_raw_parts::<'_, u8>( &timing as *const can_bittiming as *const u8, size_of::<can_bittiming>(), ) })?)?; buffer.push(link_info); buffer }, );
neli
and adding them is a bit of a pain. Feel free to see the branch yourself.
I will update this instructions as I obtain more information or based on feedback.
Thanks for keeping at this and all the updates. My guess is that there's too much upstream work to try to get in before this is workable? So, it sounds like it won't make it in to the upcoming v2.0 release. But it is something we can target for a followup v2.1 release?
Yeah, I think 2.0 should not wait for for the Netlink stuff - making stuff like FD support and the layout fixes available to people seems more important.
I'll personally keep pecking at this topic, implementing a function/request or two when I have the time. Not sure if we need a specific milestone like a version 2.1 for this tbh.
Unfortunately, I didn't get much done on this over the holidays, but I'm starting back on it now to get a release out. I began adding some support for the new Netlink features to the optional utility app, which I renamed to rcan
. This is currently in the develop
branch. So far, it looks good, but could use some better error messages.
Diving into the Netlink stuff, it definitely looks like we would want to get some stuff added upstream into the neli
and libc
crates to support this. Probably the contents of the can/netlink.h
header:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/can/netlink.h
I opened a new issue in the neli
crate for some guidance on how best to proceed.
We'll go with what's currently in there for v2.0, and keep this issue open to add some more neli features in v2.1
Thanks to @jackyzjk for adding the required structs and enums to implement set_bitrate()
much the way @jreppnow showed it, above.
I then kept inertia going to implement most of the other netlink CAN structs etc. Just as I was finishing, it occurred to me that I could have just used bindgen, like:
$ bindgen /usr/include/linux/can/netlink.h -o bindings_can_netlink.rs
But, I did use the output to adjust some of the declarations.
After that I just did some copy-pasta to implement a few more setter functions and commands, like set_restart_ms()
and restart()
.
There are still more to go, but this is a good start. And it would be good to consolidate the common part of the code.
For completeness, I'll add this...
The API definitions for communicating with the kernel about CAN are in the Linux kernel sources here:
include/uapi/linux/can/netlink.h
And although looking at an existing client like the CAN code in iproute2 is quite helpful, it may also be useful to see the kernel code that receives and processes the netlink requests for CAN. The code is fairly small and readable:
drivers/net/can/dev/netlink.c
I found the top of that file particularly interesting where it maps the requests to the data type expected for each:
https://github.com/torvalds/linux/blob/1c8b86a3799f7e5be903c3f49fcdaee29fd385b5/drivers/net/can/dev/netlink.c#L11-L25
The dev.c
file in there has additional implementation of the commands which helps to show the expected state of the interface to process particular commands, and the errors returned if not in that state:
drivers/net/can/dev/dev.c
There is also a libsocketcan C library, with a few forks. It primarily deals with the Netlink interface.
A number of important netlink commands are shipping with v3.1, including setting bitrate and FD data bitrate, setting control modes, manually restarting the interface, and setting an automatic restart delay time.
But the implementation is still far from complete, particularly in regard to reading status and parameters back from the interface. The setter functions are also for individual parameters only, and currently there is no way to set multiple parameters in a single netlink call. Setting multiple parameters requires making a separate call for each. It would be nice to add a builder pattern, or something like that to create a single request packet for multiple parameters and send them in one call.
As usual, a PR for any of this would be appreciated! Hopefully we can get some more of this implemented in v3.2.
Does anyone know how to extract the CAN-specific parameters out of the Ifla::Linkinfo
attribute that is returned from the kernel. I added a match in the nl::CanInterface::details()
function to get the link info, but can't figure out how to parse out the nested attributes.
@jreppnow , did you ever figure this out?
@fpagliughi Can you share the bytes that you are trying to decode? I don't own a physical CAN device (privately) that I can use for testing and getting traces, but presumably the messages should have the same format as the one you use for configuration of bitrate etc. I did something similar (proprietarily) for the rust-netlink project.
Het @jreppnow . Thanks for the quick reply.
I got completely stuck on this for a day, but I think I'm starting to get it now... When requesting the CAN interface details, the CAN-specific parameters like can_bittiming
, etc, are in the response message in the LinkInfo
attribute, like you described above. I've been trying to figure out how to extract it out, like here:
Lines 531 to 543 in d065f83
First I was just trying to figure out how to parse down into nested attributes.
Then I realized that the attributes are nested a little deeper than I thought.
Then it became obvious that for the final collection, we really do want the IflaCan
enum that we created pushed up and integrated into neli
data types, with the proper traits defined, for seamless extraction. And implementing FromBytes
for the structs is pretty helpful, too.
I think I may be able to get it this evening. I'll post in the morning if I was able to get it working.
Thanks.
I personally really liked the way the rust-netlink project does it, including the tests which serve as examples and documentation as well: https://github.com/rust-netlink/netlink-packet-route/blob/main/src/rtnl/link/nlas/link_infos.rs#L2672
In general, all of these libraries will have a generic way to parse an attribute (NLA), which should give you the bytes (length + data) and the id, which you need to compare to the relevant constants - and then you need to know how to parse the contents (recursive NLAs, structs, values, enums,..). And then they have a way to define specific types within the library that do this parsing step for you as well and give you the above contents as Rust types - the link above does it with enums and a couple of traits (Emitable and NlaParse I think?). That should be the goal if you are developing this for a library. I've had CAN open-sourcing our CAN extension for rust-netlink on my list for a while, but I have been otherwise occupied unfortunately.
OK. I got it. There's a really messy initial implementation up in the develop
branch, and it depends on my fork of neli
with an upstreamed IflaCan
enumeration. But at least, for now, I figured out the parsing.
After the better part of a year, this has progressed enough to close the issue! Thanks to everyone who helped on this, especially @jreppnow and @jackyzjk.
With v3.2, most of interface CAN parameters can be set or queried. There are a few minor ones that are missing; those will be added eventually as needed, or just for completeness.
The low-level constants and struct bindings were kept in this crate to speed up the release, but at some point these will be pushed upstream to the neli
and libc
crates.