linklayer/gs_usb_fd

Upstream this driver

Closed this issue Β· 78 comments

We would like to get this driver into mainline linux kernel. Has anybody already done work in that direction? @ericevenchick or somebody else?
I think that it won't be accepted as a new driver as it is right now, because it's largely a copy of what's already there in gs_usb.c.

In my opinion the best way would be to adapt the upstream gs_usb.c in a way that makes it work with the original gs_usb devices and autodetect the newer fd devices. Opinions, thoughts?

I also saw @marckleinebudde and @hartkopp around here having likely the most experience with mainline patch submitting....what do you think? Would you keep it a separate driver?

We also have some improvements ready like @moechr shared the errata bug at linklayer/cantact#19

This would solve #1, too.

We also have some improvements ready like @moechr shared the errata bug at linklayer/cantact#19

Those "improvements" are workarounds, the right place to fix that bug is in https://github.com/linklayer/cantact-pro-fw instead, not on gs_usb, IMO. It'd be nice to see FD support upstreamed, that's for sure.

the right place to fix that bug is in https://github.com/linklayer/cantact-pro-fw instead

Well, NXPs solution for this errata is to send NAK. If the driver is still sending messages in the format of the length 4 + N*16 and there is no change in the firmware the problem would still exist.
We made the decision as a reason of performance to add 1 byte to the host frame instead of sending a NAK transfer without any informational data.

There are other things to even worry more about in the firmware. Take a closer look at the can.c RAM assignment. One RX FIFO element uses 64 bytes data + 8 bytes of additional RAM per element = 72 bytes
72 bytes * 16 elements = 0x480 bytes
The difference between CAN0_TX_BUFFER_OFS = 0x20 and CAN1_STD_FILTER_OFS = 0x100 is only 0x80 ...

Ah, I see, thanks for explaining @moechr, much appreciated!

I've been quite disappointed by the overall CANtact "Pro" quality lately. I guess what upsets me the most is @ericevenchick not showing up for support after the @crowdsupply campaign (/cc @storborg @lifton?)... despite multiple bugs piling up for userspace tooling, documentation and firmware sides for the affected cantact repositories.

At this point I wouldn't recommend this dongle for any production use until the firmware, tooling and docs don't see thorough testing, bug squashing and benchmarking so that folks know the actual limits of this thing... it's definitely not worth 140eur at this point in time, IMO.

@brainstorm I totally agree with you. Simple "loop-tests" as a verification of the performance of a CAN coupler are way to less.
The big ones like PEAK Systems perform a whole bunch of tests on their CAN products.

There is a lot to to on the Cantact Pro project. As @ericevenchick mentioned - his CAN products are for hacking - maybe the hacking starts on the product itself ;)

@pfink-christ as discussed, I've moved our discussion here (responding to your comment in #3 )

The candlelight firmware support is already there so why not add further support for a candlelight version with FD. I don't see a problem with that. The alternative would be huge code duplication or splitting the driver in a gs_usb_common.c and two separate versions of gs_usb.c /gs_usb_fd.c where only the differences are handled. That's another possible solution I have to give some additional thought, because the current status is not perfect, still hacky and not quite mainline quality yet (also comfirmed by hartkopp

Yep, this makes sense, I agree. As for the hack, I assume you're talking about u8 errata_dummy; added to the gs_host_frame struct to fix the NXP errata issue. Bit of a funny story - when first starting to develop my CAN bus hardware I had chosen the LPC54628. After a month or so I got to the point of stress testing the USB and quickly found this issue. I spent a few weeks trying to fix it but after that issue and a few others in the errata I ended up changing to an STM32H7. Sucks to see people creating similar devices running into the same issues. I'll go back over my notes and see where I got to with that issue as it'd be much nicer to solve the issue in the device firmware rather than the gs_usb driver.

What's this candlelight group you both talked about? I don't think there is actually a group in the sense that they sit together and discuss what to do next. I guess there is a maintainer, a small number of contributers and a bunch of users. But let's find out and ask. I guess that might be the main repository? -> https://github.com/candle-usb/candleLight_fw So Hubert Denkmair seems to be the initial author and fenugrec seems to currently maintain the repository.
We could add a comment here that bit 8 is used as FD feature flag or something like that, then it would be documented and nobody could complain -> https://github.com/candle-usb/candleLight_fw/blob/e6b972441b767b8952d9cd8c78cd74a43be091a9/src/usbd_gs_can.c#L267

Yeah spot on, I believe https://github.com/candle-usb/candleLight_fw is the main repository. I'll try to contact them over the next week or so in an attempt to get them involved in this. I think that as long as they're aware of it, we'll be fine. As far as discussion goes, can we open up a discussion board on git (https://docs.github.com/en/discussions/quickstart) even if that's in your repo? Or would you prefer to keep discussion here?

Above all testing is always welcome. But I think first priority should be to find another way of implementing the "switch" between classic can and FD. Because that is the one thing currently keeping me from sending the patches to ML for review and my time is rather limited at the moment.

I've got the same problem with time at the moment but will try to take a look at this over the weekend. I'll fork your repo, add my VID/PID, tidy a couple of things up and will submit a PR. Then we can discuss further on how to proceed.

Cheers

Yeah spot on, I believe https://github.com/candle-usb/candleLight_fw is the main repository. I'll try to contact them over the next week or so in an attempt to get them involved in this.

The easiest would be to create a PR where you add the various bits for CAN_FD support.

Yep, this makes sense, I agree. As for the hack, I assume you're talking about u8 errata_dummy; added to the gs_host_frame struct to fix the NXP errata issue.

No, I'm talking about the length of the transmitted gs_host_frame. Because the data part has to have

  • 8 data bytes in case of classic can device
  • 64 data bytes in case of can fd device
  • 65 (64 data bytes + 1 byte to circumvent data corruption on usb) in case of fd with the affected nxp mcu(s).

In my previous implementation the struct had the maximum size and only the necessary length was transmitted via usb. The struct was cut off so to speak. That was the hacky part where I wanted to find a better solution.

it'd be much nicer to solve the issue in the device firmware rather than the gs_usb driver.

USB high-speed device in endpoint TX data corruption on NXP LPC546xx controllers (Errata sheet LPC546xx / USB.15)
This corruption occurs when the following conditions are met:
-> A TX (IN) transfer happens after a RX (OUT) transfer.
-> The RX (OUT) transfer length is 4 + N*16 (N>=0) bytes.

IMHO not easily possible. The suggested workaround by NXP would require to add some usb rx/tx control and detection when to send a NAKed TX transfer. I don't think that's useful. Sending one byte more in this case isn't too hard on the performance, whereas NAKing a transfer and causing complete dummy transfers would sum up in a bigger total performance dip. At least that is how I see it in theory. As a benefit it would keep the complexity on the firmware and kernel driver side as it is.

I reworked the code and found a - in my eyes - quite good solution without changing too much.

@marckleinebudde Could you help me out on this macro here? Do I really need to supply data[0] to sizeof_field, which looks a bit weird here or is there another clever kernel macro that I missed to calculate the length of such a flex array? The compiler doesn't complain anymore, but I have no access to the hardware right now. I will test my reworked code on Friday, then I'll be able to share more.

struct gs_host_frame {
	u32 echo_id;
	__le32 can_id;

	u8 can_dlc;
	u8 channel;
	u8 flags;
	u8 reserved;

	DECLARE_FLEX_ARRAY(u8, data);
} __packed;

#define GS_HOSTFRAME_SIZE(data_len) (sizeof(struct gs_host_frame) + sizeof_field(struct gs_host_frame, data[0]) * data_len)

The easiest would be to create a PR where you add the various bits for CAN_FD support.

That was also my line of thinking. Should we also list/define the feature bits in the kernel driver even if not used? 6 and 7 are not used at the moment but are defined in the candlelight firmware. Currently I had them removed, but I guess it wouldn't hurt adding them back.

after that issue and a few others in the errata

@BennyEvans are there any other errata relevant for us (NXP LPC... users) to additionally keep in mind or were they only relevant on your custom hardware? Because I don't think we are aware of any others.
And right now would be a good time to workaround all existing and known bugs and kinks.

That was also my line of thinking. Should we also list/define the feature bits in the kernel driver even if not used? 6 and 7 are not used at the moment but are defined in the candlelight firmware. Currently I had them removed, but I guess it wouldn't hurt adding them back.

I have already some patches that add these bits to the driver. Are you subscribed to the linux-can mailing list? I'll send these patches around.

@marckleinebudde Could you help me out on this macro here? Do I really need to supply data[0] to sizeof_field, which looks a bit weird here or is there another clever kernel macro that I missed to calculate the length of such a flex array? The compiler doesn't complain anymore, but I have no access to the hardware right now. I will test my reworked code on Friday, then I'll be able to share more.

Have a look at struct_size https://elixir.bootlin.com/linux/v5.15/source/include/linux/overflow.h#L192

I have already some patches that add these bits to the driver. Are you subscribed to the linux-can mailing list? I'll send these patches around.

Yes, I am subscribed.

Have a look at struct_size https://elixir.bootlin.com/linux/v5.15/source/include/linux/overflow.h#L192

Thanks. That works for the instances where I have a pointer to gs_host_frame available, but not in gs_can_open, where gs_host_frame is not used, but only its size is relevant e.g. for allocation of usb rx buffer with usb_alloc_coherent. I could add a pointer to gs_host_frame, but it would only be used to calculate the size with struct_size.

old: #define GS_HOSTFRAME_SIZE(data_len) (sizeof(struct gs_host_frame) + sizeof_field(struct gs_host_frame, data[0]) * data_len)
new: #define GS_HOSTFRAME_SIZE(p, data_len) struct_size(p, data, data_len)
Apart from that the macro utilizing struct_size needs two parameters. I think I will rather discard GS_HOSTFRAME_SIZE macro altogether and use struct_size directly in every occurrence (7 in total), because it's saving only one parameter and not really improving readability.

Thanks. That works for the instances where I have a pointer to gs_host_frame available, but not in gs_can_open, where gs_host_frame is not used, but only its size is relevant e.g. for allocation of usb rx buffer with usb_alloc_coherent. I could add a pointer to gs_host_frame, but it would only be used to calculate the size with struct_size.

Make it so.

old: #define GS_HOSTFRAME_SIZE(data_len) (sizeof(struct gs_host_frame) + sizeof_field(struct gs_host_frame, data[0]) * data_len)
new: #define GS_HOSTFRAME_SIZE(p, data_len) struct_size(p, data, data_len)
Apart from that the macro utilizing struct_size needs two parameters. I think I will rather discard GS_HOSTFRAME_SIZE macro altogether and use struct_size directly in every occurrence (7 in total), because it's saving only one parameter and not really improving readability.

Sounds good.

after that issue and a few others in the errata

@BennyEvans are there any other errata relevant for us (NXP LPC... users) to additionally keep in mind or were they only relevant on your custom hardware? Because I don't think we are aware of any others. And right now would be a good time to workaround all existing and known bugs and kinks.

I've just gone over my notes. Errata "3.22 USB.15: USB high-speed device in endpoint TX data corruption" is the only relevant issue that I'm aware of.

@pfink-christ may you please provide a link to the repository you're working out of? I submitted a PR to the branch and repository I thought you were working out of but you then deleted the branch about 10 minutes after which closed the PR. Just want to all be on the same page here.

Hi all,

I've created a fork of this repo and implemented compatibility in the driver for both existing classic CAN devices and newer FD devices. I'd appreciate if you could have a look over it and test the functionality. Unfortunately, I don't currently have access to an FD device so I have only confirmed that's it's working with a Canable device running the candlelight firmware.

Link to the repo: https://github.com/BennyEvans/gs_usb_fd

I took @pfink-christ version when it was available and have updated the gs_device_bt_const and gs_host_frame structs to be more compatible with both Classic and FD CAN simultaneously. The bit timing request now initially request an original classic can structure (FD devices will just truncate and send a classic structure) and only after checking for FD capability after the initial timing request will it request the full FD bit timing structure. I've also removed the VID gates that were in @pfink-christ revision and now if a device reports FD capability, the driver will treat it as an FD device.

@marckleinebudde as you've had some experience modifying this driver, I'd appreciate any feedback.

@pfink-christ Apologies if you were working out of another repository. If you were and can share it, perhaps we can analyse each other's versions and move forward towards something we can submit to mainline. I've kept both our devices VID/PID in my version.

Edit: Feel free to submit any PRs.

Cheers

Hey @BennyEvans,

@marckleinebudde as you've had some experience modifying this driver, I'd appreciate any feedback.

Make sure to have a clean patch stack. Only change one thing at a time. Feel free to use https://github.com/marckleinebudde/linux/commits/gs_usb_fd as a base. It already adds the missing bit definitions to the driver. Speaking of clean patches, I myself have to add proper patch descriptions to all patches.

Thanks @marckleinebudde

I've not pushed code into mainline before so it's all a bit of a learning curve. I've forked the repository you've linked and will commit changes to that in smaller chunks tomorrow. Is this what's expected?

However, this being said, I'll still require someone to test my version of the driver with a CAN FD device as I currently don't have access to one.

Cheers

I've not pushed code into mainline before so it's all a bit of a learning curve. I've forked the repository you've linked and will commit changes to that in smaller chunks tomorrow. Is this what's expected?

ACK

However, this being said, I'll still require someone to test my version of the driver with a CAN FD device as I currently don't have access to one.

I have a board here.

@BennyEvans my current working repository is not online because I want to test my code properly before I publish something. That's also the reason I deleted my old working branch, so that nobody uses or works on this not anymore up to date codebase.
I don't quite understand why you want to do the same work again that I already did. Either you do it or me. Not both please. I suggest I will upload my code later today, then you can try to improve or add things.

Uff... @marckleinebudde I rebased my changes on your repository, but I realized that your commit marckleinebudde/linux@86f7bf6, which adds the timestamp after the data field in the struct, kinda killed my whole idea of the flex array. What do you think about the idea of including the timestamp in the data array or is something like the following better? The size calculation would have to be adapted as well (decrease the flex array length by the size of the struct classic_can_data). This would assume there is no timestap data after the 64 data bytes in the CAN FD case or it would get ugly again :D

struct gs_host_frame {
        u32 echo_id;
        __le32 can_id;

        u8 can_dlc;
        u8 channel;
        u8 flags;
        u8 reserved;

        union {
                struct classic_can_data {
                        u8 data_c8[8];
                        u32 timestamp_us;
                } __packed;
                DECLARE_FLEX_ARRAY(u8, data);
        } __packed;
} __packed;

Although time stamps are not supported in the Linux driver yet, we should keep this in mind.

What do you think about the idea of including the timestamp in the data array or is something like the following better?

I'd like to have the timestamp as a separate u32. This looks good, given the compiler likes this :)

struct gs_host_frame {
        u32 echo_id;
        __le32 can_id;

        u8 can_dlc;
        u8 channel;
        u8 flags;
        u8 reserved;

        union {
                struct {

I think you can use an anonymous struct.

                        u8 data_cc[8];

Let's call it data_cc for Classical CAN.

                        u32 timestamp_us;
                } __packed;
                DECLARE_FLEX_ARRAY(u8, data);
        } __packed;
} __packed;

@BennyEvans my current working repository is not online because I want to test my code properly before I publish something. That's also the reason I deleted my old working branch, so that nobody uses or works on this not anymore up to date codebase. I don't quite understand why you want to do the same work again that I already did. Either you do it or me. Not both please. I suggest I will upload my code later today, then you can try to improve or add things.

@pfink-christ no problem, don't want to step on any toes, just trying to help. I'll wait until you've made your code public before proceeding any further. Thanks for your work on all this. Please keep me updated.

Hi everybody, I hope this doesn't come across as offtopic, but after relatively long intensive runs of gs_usb_fd with CANtact Pro, I'm getting those OOM messages fairly consistently (fails at around the 4-5th day):

[632928.354876] gs_usb_fd 1-1.3:1.0: Configuring for 2 interfaces
[632928.394207] usbcore: registered new interface driver gs_usb_fd
[632929.568473] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
[632929.568538] gs_usb_fd 1-1.3:1.0 can0: No memory left for USB buffer
[632940.440915] usbcore: deregistering interface driver gs_usb_fd
[632940.804206] gs_usb_fd 1-1.3:1.0: Configuring for 2 interfaces
[632940.829411] usbcore: registered new interface driver gs_usb_fd
[632942.007201] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
[632942.007273] gs_usb_fd 1-1.3:1.0 can0: No memory left for USB buffer

Possible driver memory leak?

$ free
              total        used        free      shared  buff/cache   available
Mem:         440412      103808       29892       22432      306712      260840
Swap:        102396         256      102140

$ cat /proc/cpuinfo
processor	: 0
model name	: ARMv6-compatible processor rev 7 (v6l)
BogoMIPS	: 697.95
Features	: half thumb fastmult vfp edsp java tls
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xb76
CPU revision	: 7

Hardware	: BCM2835
Revision	: 000e
Serial		: 000000007c696cc1
Model		: Raspberry Pi Model B Rev 2

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        15G  6.7G  7.1G  49% /
devtmpfs        183M     0  183M   0% /dev
tmpfs           216M     0  216M   0% /dev/shm
tmpfs           216M   22M  194M  11% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           216M     0  216M   0% /sys/fs/cgroup
/dev/mmcblk0p1  253M   49M  204M  20% /boot
tmpfs            44M     0   44M   0% /run/user/1000

$ uname -a
Linux raspberrypi 5.10.17+ #1421 Thu May 27 13:58:02 BST 2021 armv6l GNU/Linux

How could I debug this further?

Restarting the interface is ineffective, it needs a full reboot to be back into working shape.

you can try to increase the CMA size: https://elixir.bootlin.com/linux/v5.15/source/Documentation/admin-guide/kernel-parameters.txt#L638

Thanks! What'd you put as a reasonable value given the hardware outlined?

Look at your kernel log and search for CMA, double that size:

dmesg | grep -i cma

Thanks! That explains things, before rebooting:

$ dmesg | grep -i cma
[633678.406776] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
[633690.968344] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
[633703.442953] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
[633715.949471] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -12
(...)

After reboot:

$ dmesg | grep cma
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] Memory: 374456K/458752K available (8638K kernel code, 1324K rwdata, 2816K rodata, 420K init, 837K bss, 18760K reserved, 65536K cma-reserved)
[   19.635959] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
[   19.657850] bcm2835_vc_sm_cma_probe: Videocore shared memory driver

So I bumped it to cma=128M via /boot/cmdline.txt:

[    0.000000] Kernel command line: coherent_pool=1M snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 bcm2708_fb.fbwidth=640 bcm2708_fb.fbheight=480 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  console=ttyAMA0,115200 console=tty1 root=PARTUUID=4c9fbae7-02 rootfstype=ext4 elevator=deadline fsck.repair=yes cma=128M rootwait
[    0.000000] Memory: 308792K/458752K available (8638K kernel code, 1324K rwdata, 2816K rodata, 420K init, 837K bss, 18888K reserved, 131072K cma-reserved)

I'll report back if it fails again (not entirely sure why gs_usb_fd would require more than 64MB contiguous alloc though?), thanks for the suggestion in any case ;)

I think the memory gets fragmented over time....

I'd like to have the timestamp as a separate u32. This looks good, given the compiler likes this :)

My compiler is currently fine with it, but the setup from my colleague had some issues with DECLARE_FLEX_ARRAY on Friday. I think some additional tests would be good once the last issue below is cleared.

I think you can use an anonymous struct.

The idea was to be able to do something like this:

struct gs_host_frame {
        u32 echo_id;
        __le32 can_id;

        u8 can_dlc;
        u8 channel;
        u8 flags;
        u8 reserved;

        union {
                struct classic_can {
                        u8 data_cc[8];
                        u32 timestamp_us;
                } __packed;
                DECLARE_FLEX_ARRAY(u8, data);
        } __packed;
} __packed;

...

gs_hf_size = struct_size(hf, data, hf_data_len) - sizeof(struct classic_can);

Because now struct_size reports size = 88 when hf_data_len = 64. If my math is correct it's exactly sizeof(struct classic can) = 12 too long. Using a macro again would be a practical solution or reducing hf_data_len by 12, which I don't like for obvious reasons.

Use pahole /path/to/gs_usb.o to find out how the compiler placed the struct in memory.

struct gs_host_frame {
        u32                        echo_id;              /*     0     4 */
        __le32                     can_id;               /*     4     4 */
        u8                         can_dlc;              /*     8     1 */
        u8                         channel;              /*     9     1 */
        u8                         flags;                /*    10     1 */
        u8                         reserved;             /*    11     1 */
        union {
                struct {
                        u8         data_cc[8];           /*    12     8 */
                        u32        timestamp_us;         /*    20     4 */
                };                                       /*          12 */
                struct {
                        struct {
                        } __empty_data;                  /*    12     0 */
                        u8         data[0];              /*    12     0 */
                };                                       /*           0 */
        };                                               /*    12    12 */

        /* size: 24, cachelines: 1, members: 7 */
        /* last cacheline: 24 bytes */
};

If I interpret it correctly the union has a size of 12 even though data has a length of zero.

How does it look, if you remove the __packed from the union?

That comment makes no sense....

It would look the same.

Here is my current working branch for more context -> pfink-christ/net-next@e28e72b

What about this:

struct gs_host_frame {
	u32 echo_id;
	__le32 can_id;

	u8 can_dlc;
	u8 channel;
	u8 flags;
	u8 reserved;

	union {
		DECLARE_FLEX_ARRAY(
				   struct classic_can {
					   u8 data[8];
				   }, classic_can);
		DECLARE_FLEX_ARRAY(
				   struct classic_can_ts {
					   u8 data[8];
					   u32 timestamp;
				   }, classic_can_ts);
		DECLARE_FLEX_ARRAY(
				   struct canfd {
					   u8 data[64];
				   }, canfd);
	};
} __packed;

Edit: removed the u32 timestamp for canfd.

To make it less ugly, declare the new structs outside of the struct gs_host_frame. There are already some in tree-users that put several DECLARE_FLEX_ARRAY() in an union at the and of a struct.

struct gs_host_frame {
        u32                        echo_id;              /*     0     4 */
        __le32                     can_id;               /*     4     4 */
        u8                         can_dlc;              /*     8     1 */
        u8                         channel;              /*     9     1 */
        u8                         flags;                /*    10     1 */
        u8                         reserved;             /*    11     1 */
        union {
                struct {
                        struct {
                        } __empty_classic_can;           /*    12     0 */
                        struct classic_can classic_can[0]; /*    12     0 */
                };                                       /*    12     0 */
                struct {
                        struct {
                        } __empty_classic_can_ts;        /*    12     0 */
                        struct classic_can_ts classic_can_ts[0]; /*    12     0 */
                };                                       /*    12     0 */
                struct {
                        struct {
                        } __empty_canfd;                 /*    12     0 */
                        struct canfd canfd[0];           /*    12     0 */
                };                                       /*    12     0 */
        };                                               /*    12     0 */

        /* size: 12, cachelines: 1, members: 7 */
        /* last cacheline: 12 bytes */
};

For classical CAN you would use:
gs_hf_size = struct_size(hf, classic_can, 1)
for CAN-FD:
gs_hf_size = struct_size(hf, canfd, 1)

I haven't tested if struct_size() gives the correct size, though.

Thank you @marckleinebudde for your help!

I feel stupid πŸ€¦β€β™‚οΈ It took me two more different implementations to to run into different problems and fully understand the big picture of your proposal πŸ™„ But I got it now and I must say that I'm fairly satisfied with the following:

struct classic_can {
        u8 data[8];
} __packed;

struct classic_can_ts {
        u8 data[8];
        u32 timestamp;
} __packed;

struct canfd {
        u8 data[64];
} __packed;

struct canfd_quirk {
        u8 data[65];
} __packed;

struct gs_host_frame {
        u32 echo_id;
        __le32 can_id;

        u8 can_dlc;
        u8 channel;
        u8 flags;
        u8 reserved;

        union {
                DECLARE_FLEX_ARRAY(struct classic_can, classic_can);
                DECLARE_FLEX_ARRAY(struct classic_can_ts, classic_can_ts);
                DECLARE_FLEX_ARRAY(struct canfd, canfd);
                DECLARE_FLEX_ARRAY(struct canfd_quirk, canfd_quirk);
        };

} __packed;

Resulting in:

struct gs_host_frame {
        u32                        echo_id;              /*     0     4 */
        __le32                     can_id;               /*     4     4 */
        u8                         can_dlc;              /*     8     1 */
        u8                         channel;              /*     9     1 */
        u8                         flags;                /*    10     1 */
        u8                         reserved;             /*    11     1 */
        union {
                struct {
                        struct {
                        } __empty_classic_can;           /*    12     0 */
                        struct classic_can classic_can[0]; /*    12     0 */
                };                                       /*           0 */
                struct {
                        struct {
                        } __empty_classic_can_ts;        /*    12     0 */
                        struct classic_can_ts classic_can_ts[0]; /*    12     0 */
                };                                       /*           0 */
                struct {
                        struct {
                        } __empty_canfd;                 /*    12     0 */
                        struct canfd canfd[0];           /*    12     0 */
                };                                       /*           0 */
                struct {
                        struct {
                        } __empty_canfd_quirk;           /*    12     0 */
                        struct canfd_quirk canfd_quirk[0]; /*    12     0 */
                };                                       /*           0 */

        /* size: 12, cachelines: 1, members: 7 */
        /* last cacheline: 12 bytes */
};

struct_size reports 77 (with canfd_quirk), so that seems to be ok, too.

Full code -> pfink-christ/net-next@db0f695

checkpatch.pl is also satisfied. Do you want me to send it directly to linux-can or do you want to take a quick look first? BTW do you want to improve the description of your patches yourself or should I give it a try?

The last commit of gs_usb_fd-rc8 is up for discussion whether it's better to implement extended bitrate constants as additional feature bit or if a VID/PID switch is better here. Either we squash it together with the second to last commit or we delete it. CANtact Pro does not implement this feature right now, but our hardware does. My personal vote would be for the feature bit variant.

Right now code is only tested with our CANext FD hardware. I can test with classic candlelight and CANtact Pro again on Friday. Or @BennyEvans and @brainstorm can do some testing as well with pfink-christ/net-next@49bec00?

I feel stupid οΏΌ It took me two more different implementations to to run into different problems and fully understand the big picture of your proposal οΏΌ

Finding out things yourself usually means you've learned more than blindly re-using things.

But I got it now and I must say that I'm fairly satisfied with the following

Looks good.

Please split pfink-christ/net-next@db0f695 into separate patches:

  • remove the u32 timestamp_us, we'll squash that into my patch, where it was introduced. So that my patch will only adds the BIT. We can add proper timestamp support in a separate patch - It needs additional work anyways.
  • add the union with only the DECLARE_FLEX_ARRAY(struct classic_can, classic_can)
  • add CAN-FD support
  • add support for the canfd_quirk

I'm not entirely sure, if the quirk should only be based on a feature bit, as the CANtact pro's firmware doesn't set that bit. We might have to do a compare on the Product string.

The last commit of gs_usb_fd-rc8 is up for discussion whether it's better to implement extended bitrate constants as additional feature bit or if a VID/PID switch is better here.

IMHO should be a feature.

Either we squash it together with the second to last commit or we delete it. CANtact Pro does not implement this feature right now, but our hardware does. My personal vote would be for the feature bit variant.

ACK

The last commit of gs_usb_fd-rc8 is up for discussion whether it's better to implement extended bitrate constants as additional feature bit or if a VID/PID switch is better here. Either we squash it together with the second to last commit or we delete it. CANtact Pro does not implement this feature right now, but our hardware does. My personal vote would be for the feature bit variant.

I also think a feature would be a better idea.

Right now code is only tested with our CANext FD hardware. I can test with classic candlelight and CANtact Pro again on Friday. Or @BennyEvans and @brainstorm can do some testing as well with pfink-christ/net-next@49bec00?

Thanks @pfink-christ I'll test now with a classic CAN canable and update you with the results of my testing later tonight.

When would you be happy for me to submit a PR to add my VID/PID? To what branch would you like me to submit a PR?

#define USB_CAN_DEBUGGER_VENDOR_ID   0x16d0
#define USB_CAN_DEBUGGER_PRODUCT_ID  0x10b8

Thanks again for your work. Appreciate it.

  • remove the u32 timestamp_us

done

BTW do you want to improve the description of your patches yourself or should I give it a try?

done

I've force pushed https://github.com/marckleinebudde/linux/tree/gs_usb_fd

My compiler is currently fine with it, but the setup from my colleague had some issues with DECLARE_FLEX_ARRAY on Friday. I think some additional tests would be good once the last issue below is cleared.

I'm also having an issue compiling this.

gs_usb.c:213:17: error: expected specifier-qualifier-list before β€˜DECLARE_FLEX_ARRAY’

Is this the same issue your colleague was having?

DECLARE_FLEX_ARRAY has only recently been added to the kernel, you need >= v5.16.

DECLARE_FLEX_ARRAY has only recently been added to the kernel, you need >= v5.16.
@marckleinebudde yep, thankyou. Just figured this out as you replied.

@pfink-christ I've performed some very basic testing with a canable device running the candlelight FW and everything seems to be ok. It also seems to work correctly with my device. I'll perform some more in depth testing once we decide on the extended bitrate stuff and after I've submitted the PR to add my VID/PID.

Please keep me updated on when you'd like me to submit the PR and to what branch.

Cheers

I'm not entirely sure, if the quirk should only be based on a feature bit, as the CANtact pro's firmware doesn't set that bit. We might have to do a compare on the Product string.

I guess it is reasonable to also allow CANtact pro users to benefit from the workaround without firmware update. I will prepare a patch following this proposal. But CANtact pro has additional firmware instability problems (especially at higher speeds or at high load) and I don't see @ericevenchick around supporting his hardware right now. I don't know if there will be any other community effort or somebody willing to invest time to improve the firmware. Maybe fully supporting CANtact pro with the next firmware version (with usb quirk feature set) would also be an option?
Or thinking aloud - we add the quirk explicitly for CANtact pro and this firmware version (without the feature set) and also add an instability warning to prevent users from reporting the same problems to mailing lists, forums and here on github again and again? Then users could benefit from the workaround and use it as far as the current firmware allows it and are notified that there might be other problems.

remove the u32 timestamp_us, we'll squash that into my patch, where it was introduced. So that my patch will only adds the BIT. We can add proper timestamp support in a separate patch - It needs additional work anyways.

I'm going to split the introduction of the struct classic_can_ts into a separate commit. Then you can keep it in this patch series or put it in your queue until timestamp support is completed. At least that's how I understood this one. The rest is clear. I hope I will find some spare time in the next days and rework the commits asap.

@pfink-christ would you like me to submit a PR or would you prefer to add my VID/PID (from #2 (comment)) in the same commit/patch that you add yours? As I've not pushed code into mainline before I'm not sure what the best approach is.

@marckleinebudde may be able to comment on the best approach here.

Cheers

I guess it is reasonable to also allow CANtact pro users to benefit from the workaround without firmware update. I will prepare a patch following this proposal.

Use something like:

strcmp(udev->manufacturer, "LinkLayer Labs") && strcmp(udev->product, "CANtact Pro")`

to detect if the device is a CANtact pro.

But CANtact pro has additional firmware instability problems (especially at higher speeds or at high load) and I don't see @ericevenchick around supporting his hardware right now. I don't know if there will be any other community effort or somebody willing to invest time to improve the firmware.

I've seen missing CAN frames under high load, but don't remember if it was in the RX or TX path... I have a CANtact pro here, hopefully I'll find some time to look into the firmware.

Maybe fully supporting CANtact pro with the next firmware version (with usb quirk feature set) would also be an option?

A new firmware should have a proper vendor and product ID set, too. If the CANtact is Open Hardware it's relatively easy to get official IDs. I contacted @ericevenchick on this, but he hasn't answered, yet.

Or thinking aloud - we add the quirk explicitly for CANtact pro and this firmware version (without the feature set) and also add an instability warning to prevent users from reporting the same problems to mailing lists, forums and here on github again and again?

I haven't checked the CANtact firmware git, but if all existing versions use "LinkLayer Labs/CANtact Pro" and dVendor=1d50, idProduct=606f it's quite easy to detect them and issue a known stability problems warning.

Then users could benefit from the workaround and use it as far as the current firmware allows it and are notified that there might be other problems.

Hey @BennyEvans,

if you only have to add VID/PID to the driver, you might use https://github.com/pfink-christ/net-next/tree/gs_usb_fd-rc8 as the base.

I think @pfink-christ is fluent enough in git to take the patch cherry pick it into his current development branch. While patches are shuffled around, split and patch descriptions updated, all for mainline inclusion, a "normal" github merge doesn't work.

I'm going to split the introduction of the struct classic_can_ts into a separate commit. Then you can keep it in this patch series or put it in your queue until timestamp support is completed.

Sounds good.

At least that's how I understood this one. The rest is clear. I hope I will find some spare time in the next days and rework the commits asap.

Can you push your current work in progress to github, so that @BennyEvans can send a PR for the new VID/PID.

I've seen missing CAN frames under high load, but don't remember if it was in the RX or TX path... I have a CANtact pro here, hopefully I'll find some time to look into the firmware.

I personally like the idea of open hard and software, but I don't quite like @ericevenchick's behavior at the moment leaving his customers alone in the dark with...well, quite everything including a buggy firmware. And it shouldn't be on the task list of the maintainer to look into firmware issues πŸ˜‰

So, in the interest of all users I talked to our head of development to release our firmware improvements, which should also apply to CANtact pro and he agreed. But I don't think it will happen before early next year. If you need it before that contact me by mail and we will find a solution...maybe I could supply at least the compiled firmware binary until then, but let's get the driver ready first.

@BennyEvans create a PR on the existing branch with proper description and all and I will pick it up. It's none of my business, but I would consider again if you really just want to name your device CAN_DEBUGGER...

I personally like the idea of open hard and software, but I don't quite like @ericevenchick's behavior at the moment leaving his customers alone in the dark with...

ACK 😞

So, in the interest of all users I talked to our head of development to release our firmware improvements, which should also apply to CANtact pro and he agreed.

Nice! 🌈

But I don't think it will happen before early next year. If you need it before that contact me by mail and we will find a solution...maybe I could supply at least the compiled firmware binary until then, but let's get the driver ready first.

ACK - let's focus on the Linux driver first.

@BennyEvans create a PR on the existing branch with proper description and all and I will pick it up. It's none of my business, but I would consider again if you really just want to name your device CAN_DEBUGGER...

If you have a proper a VID/PID, names are just sound and smoke (as we say in German) πŸ˜„

Granted:

#define USB_CAN_DEBUGGER_VENDOR_ID   0x16d0
#define USB_CAN_DEBUGGER_PRODUCT_ID  0x10b8

is a bit too generic.

@BennyEvans create a PR on the existing branch with proper description and all and I will pick it up. It's none of my business, but I would consider again if you really just want to name your device CAN_DEBUGGER...

Thanks @pfink-christ, I'll change the name and send a PR on the weekend to https://github.com/pfink-christ/net-next/tree/gs_usb_fd-rc8. I'll also have some time this weekend to perform some further testing.

EDIT: Managed to find the time to submit the PR this afternoon. Fairly sure the descriptions are all OK but let me know if you think I need to change something. Have changed the name to "USB_ABE_CANDEBUGGER_FD_". I'll continue testing the new version on the weekend. Cheers.

Just found some interesting back groud information to the 0x16d0 USB VID: It has been obsoleted by USB, cannot be re-used and thus stays unique:

https://www.mcselec.com/index.php?page=shop.product_details&product_id=92&option=com_phpshop

Just found some interesting back groud information to the 0x16d0 USB VID: It has been obsoleted by USB, cannot be re-used and thus stays unique:

https://www.mcselec.com/index.php?page=shop.product_details&product_id=92&option=com_phpshop

Yep, so I've purchased a PID from them. This isn't an issue submitting to mainline is it?

No problem at all, just found the "dispute" between MSC and USB.ORG interesting. There are some other mentions of 0x16d0 devices in the kernel:

https://elixir.bootlin.com/linux/v5.15/source/sound/usb/quirks.c#L1621

BTW: If you need a USB VID/PID for open hardware you can apply here: https://github.com/openmoko/openmoko-usb-oui

Hi @marckleinebudde, I finished reworking the commits.

-> https://github.com/pfink-christ/net-next/tree/gs_usb_fd-rc9

I moved the timestamp commit after the commit introducing the union/flex array, otherwise it would have gotten a bit complicated.

Except the last commit introducing the exception for CANtact Pro, no further changes were made.

I decided comparing manufacturer, product and firmware version would suffice detecting the devices requiring the exception. Then CANtact Pro can set the quirk request in the next firmware version. Or do you prefer adding checks for vid/pid as well?

Hey @pfink-christ!

Thanks for your work. I've done some changes to your https://github.com/pfink-christ/net-next/tree/gs_usb_fd-rc9 and pushed the resulting tree to https://github.com/marckleinebudde/linux/tree/gs_usb_fd.

The changes are:

  • rebased to latest net-next/master + my commits with enhanced patch descriptions
  • removed struct classic_can_ts as it's not implemented yet
  • renamed GS_CAN_FEATURE_REQ_USB_QUIRK to GS_CAN_FEATURE_REQ_USB_QUIRK_LPC546XX
  • don't use data[65] in struct canfd_quirk but an extra u8 dummy
  • renamed struct gs_can::gs_hf_size to gs_can::hf_size
  • calculate gs_can::hf_size based of whether CAN-FD is requested by user or not
  • added a struct classic_can_quirk
  • don't use union gs_device_bt_const but separate structs
  • make request/setting of GS_CAN_FEATURE_BT_CONST_EXT dependent on GS_CAN_FEATURE_FD
  • assign missing name to data_bt_const
  • only assign data_bittiming_const and do_set_data_bittiming if GS_CAN_FEATURE_FD

Does your device support the struct gs_host_frame with struct classic_can for classical CAN mode? The CANtact Pro does, but it needs a quirk byte at the end of the struct gs_host_frame though.

If I don't hammer the device too hard sending works almost without loosing frames:

cangen can1 -Di -L1 -I2 -p10 -g 0.0001

and on my laptop with a different CAN adapter:

cansequence -rv can0

I see lost frames every 5 ... 10 seconds.

With -g0 the first lost frame is after sending less then 256. 😞

@pfink-christ, please add your S-o-b for marckleinebudde/linux@ebe669a. As I got this patch went through your hands. It's Ok if you give it me here in this issue.

@pfink-christ @ericevenchick,

I added @ericevenchick as Co-developed-by: Eric Evenchick <eric@evenchick.com> to the patch: marckleinebudde/linux@acc79ad. @ericevenchick, please give us your S-o-b, so that we can bring this code to mainline.

@pfink-christ, as you've added hardware for your employer's hardware, what about adding you to the MAINTAINERS file in the kernel. This means when a patch targets this driver, you'll get an email to review and test the changes.

Thanks for your work. I've done some changes

You are welcome. =) ...quite some more changes πŸ˜„ Awesome!

We tried to reproduce your setup, where you are losing frames, but not even CANtact Pro lost frames with these commands in our setup (I was told that still the original CANtact Pro firmware is on our device):

ip link set can0 type can bitrate 1000000 dbitrate 4000000 fd on
cangen can0 -Di -L1 -I2 -p10 -g0

(Receiving system with peak usb can fd adapter):
cansequence -rv can0

What bitrate did you use? When omitting -p10 we could exhaust the buffer and cangen aborts (write: No buffer space available), but apart from that no lost frames were reported.

I missed that yesterday evening at first and it seems you pulled it faster than I was able to force push my reword (or you changed it back? - I looked it up, because I wasn't sure myself) -> marckleinebudde/linux@f2c7dab "an union" -> "a union"

as you've added hardware for your employer's hardware, what about adding you to the MAINTAINERS file in the kernel. This means when a patch targets this driver, you'll get an email to review and test the changes.

I have no objections. You can add me or I can do it myself in a follow up commit.

  • rebased to latest net-next/master + my commits with enhanced patch descriptions
  • removed struct classic_can_ts as it's not implemented yet
  • renamed GS_CAN_FEATURE_REQ_USB_QUIRK to GS_CAN_FEATURE_REQ_USB_QUIRK_LPC546XX

Works for me.

  • don't use data[65] in struct canfd_quirk but an extra u8 dummy

Makes sense. I think this survived from a former implementation before the union and I was too focused on changing as little as possible.

  • renamed struct gs_can::gs_hf_size to gs_can::hf_size

ACK

  • calculate gs_can::hf_size based of whether CAN-FD is requested by user or not

Wasn't completely aware of classic_can being used if the device supports CAN-FD, but you are of course right.

  • added a struct classic_can_quirk

I thought about it and because we didn't see any corruption there I later forgot about that possibility. Or maybe we were just focused too much on testing FD, that we did't notice it. Thanks for adding it.

  • don't use union gs_device_bt_const but separate structs

Again remains from the previous implementation cycle where this was necessary. Sorry for not cleaning up properly.

  • make request/setting of GS_CAN_FEATURE_BT_CONST_EXT dependent on GS_CAN_FEATURE_FD

ACK

  • assign missing name to data_bt_const

ACK

  • only assign data_bittiming_const and do_set_data_bittiming if GS_CAN_FEATURE_FD

ACK

Tested and Signed-off-by me :)

Seems like @ericevenchick has the last word then.

I missed that yesterday evening at first and it seems you pulled it faster than I was able to force push my reword (or you changed it back? - I looked it up, because I wasn't sure myself) -> marckleinebudde/linux@f2c7dab "an union" -> "a union"

Yes I "fixed" that, but just found out, that "a union" is correct. Will fix.

Seems like @ericevenchick has the last word then.

Let's give him some time. He has modified the original GPL driver, so the https://github.com/linklayer/gs_usb_fd driver is GPL'ed as well. He even posted a link to the driver on the linux-can ML https://lore.kernel.org/all/ebff6876-57de-4938-8e9c-bf3994ec77f8@www.fastmail.com, but didn't include a S-o-b. 😞

We tried to reproduce your setup, where you are losing frames, but not even CANtact Pro lost frames with these commands in our setup (I was told that still the original CANtact Pro firmware is on our device):

ip link set can0 type can bitrate 1000000 dbitrate 4000000 fd on
cangen can0 -Di -L1 -I2 -p10 -g0

(Receiving system with peak usb can fd adapter):

cansequence -rv can0

What bitrate did you use?

500 kbit/s on a imx6 single core

When omitting -p10 we could exhaust the buffer and cangen aborts (write: No buffer space available), but apart from that no lost frames were reported.

The no buffer space available error is expected to happen....I'll test on other SoCs with other USB host controllers.

Thanks for all of your work on this @pfink-christ and @marckleinebudde

500 kbit/s on a imx6 single core
The no buffer space available error is expected to happen....I'll test on other SoCs with other USB host controllers.

We actually tested on a dual core i.MX6DL where we didn't see any lost packages, but maybe we missed something else in our setup. 🀷

Hi all. Thanks for all the work on this. I've been away from this project for some personal reasons but it's great to see the work done here!

Patch is signed-off-by me too.

Hello @BennyEvans,

Yeah spot on, I believe https://github.com/candle-usb/candleLight_fw is the main repository. I'll try to contact them over the next week or so in an attempt to get them involved in this. I think that as long as they're aware of it, we'll be fine. As far as discussion goes, can we open up a discussion board on git (https://docs.github.com/en/discussions/quickstart) even if that's in your repo? Or would you prefer to keep discussion here?

I've send a PR that adds the CAN-FD related bits to the candlelight fw: candle-usb/candleLight_fw#79

Hi , just dropping in to introduce myself - I'm indeed the de-facto maintainer of candleligh_fw (Hubert "gave me the keys" and has not been involved recently). This seems like a low-impact change for our firmware (the stm32f042 / f072 we target do not even support CAN-FD) and have no objection to cooperating. The less quirks and special cases in the mainline kernel driver, the better. We've had some users doing some isolated work on FD (e.g. candle-usb/candleLight_fw#21 ) but not sure what came of it.

Hey,

I just found an overlap in the USB request ids:

enum gs_usb_breq {
	GS_USB_BREQ_HOST_FORMAT = 0,
	GS_USB_BREQ_BITTIMING,
	GS_USB_BREQ_MODE,
	GS_USB_BREQ_BERR,
	GS_USB_BREQ_BT_CONST,
	GS_USB_BREQ_DEVICE_CONFIG,
	GS_USB_BREQ_TIMESTAMP,
	GS_USB_BREQ_IDENTIFY,
	GS_USB_BREQ_DATA_BITTIMING, // == GS_USB_BREQ_GET_USER_ID
	GS_USB_BREQ_BT_CONST_EXT,   // == GS_USB_BREQ_SET_USER_ID
};

As the GS_USB_BREQ_{GET,SET}_USER_ID are older, we have to move the GS_USB_BREQ_DATA_BITTIMING and GS_USB_BREQ_BT_CONST_EXT to a new value.

As the CANtact Pro is out there I'll add a workaround for the GS_USB_BREQ_DATA_BITTIMING. I'll decide on the manufacturer and product strings.

Do we need workarounds for other devices and/or GS_USB_BREQ_BT_CONST_EXT, too? How should we detect if we need to enable these workarounds?

regards,
Marc

Hello @BennyEvans,

Yeah spot on, I believe https://github.com/candle-usb/candleLight_fw is the main repository. I'll try to contact them over the next week or so in an attempt to get them involved in this. I think that as long as they're aware of it, we'll be fine. As far as discussion goes, can we open up a discussion board on git (https://docs.github.com/en/discussions/quickstart) even if that's in your repo? Or would you prefer to keep discussion here?

I've send a PR that adds the CAN-FD related bits to the candlelight fw: candle-usb/candleLight_fw#79

Apologies @marckleinebudde, I had completely forgotten about doing this. Thanks for taking the lead on it.

As the CANtact Pro is out there I'll add a workaround for the GS_USB_BREQ_DATA_BITTIMING. I'll decide on the manufacturer and product strings.

Do we need workarounds for other devices and/or GS_USB_BREQ_BT_CONST_EXT, too? How should we detect if we need to enable these workarounds?

I believe the CANtact Pro is currently the only production device using this driver for CAN FD and thus, no other production device should be using the GS_USB_BREQ_DATA_BITTIMING feature. I think it would be safe to add a workaround for just that device. The GS_USB_BREQ_BT_CONST_EXT feature is something we've introduced so would only be an issue moving it if @pfink-christ has an objection with his hardware. Might be best to wait for his input.

Cheers,
Ben

The GS_USB_BREQ_BT_CONST_EXT feature is something we've introduced so would only be an issue moving it if @pfink-christ has an objection with his hardware. Might be best to wait for his input.

Makes sense.

First of all - happy new year to all πŸŽ‰πŸ§¨πŸ’₯ sorry for being a bit late....didn't check my business mails any earlier.

As the GS_USB_BREQ_{GET,SET}_USER_ID are older, we have to move the GS_USB_BREQ_DATA_BITTIMING and GS_USB_BREQ_BT_CONST_EXT to a new value.

Thanks @marckleinebudde for your scrutiny! No objections from my side, as we can still move the bits around in our firmware to match the driver. We currently only have a low number of test devices around, where updating the firmware will be no problem at all and become necessary anyway when mainline support is there.

@marckleinebudde your latest version is currently not online somewhere, is it?

I guess you are going to send the patches upstream when you are ready and when you have time for it? Or are there still open issues that I could help with? Or maybe some more testing,...?

Hey @pfink-christ,

I just pushed my latest work-in-progress to: https://github.com/marckleinebudde/linux/tree/gs-usb-fd

EDIT: the driver is now on it's way to upstream.

I had to split the hf_size into hf_size_rx and hf_size_tx, as the TX buffers are used for all interfaces. This makes problems if you start one interface in CAN-FD mode and the other in classical CAN mode. And I had to implement the GS_USB_BREQ_QUIRK_OVERLAP_DATA_BITTIMING for the CANtact Pro: marckleinebudde/linux@2c19410
I still have to clean up the URB error handling.

So testing would be good.

Thanks again for all your work on this @marckleinebudde @pfink-christ. I've been away for the past few weeks but will will jump in over the next few days and perform some further testing.

FYI: I've send a pull request to net-next/master with CAN-FD support https://lore.kernel.org/all/20220310142903.341658-1-mkl@pengutronix.de/

Awesome! Thanks for all of your work on this @marckleinebudde @pfink-christ. Appreciate it.

So... What's the current status of upstreaming this driver?

It's mainline since v5.18.

@marckleinebudde thanks a lot!