luxonis/depthai-bootloader-shared

`dai::Response:*::success` not initialized, some code paths then test garbage mem values

diablodale opened this issue ยท 12 comments

Three classes within dai::Response:* have a success member that is later directly tested in branching like if(resp.success) ...
However, there are multiple codepaths in which success is never initialized and therefore holds a random garbage value.
Which then leads to failures in branching.

Easy fix. The classes should set a default init of 0 aka false.

Setup

  • all
  • depthai-bootloader-shared main and depthai-core main

Repro

Found during code cleanup I'm in process doing. Here is one example...

The branch which tests if(resp.success) https://github.com/luxonis/depthai-core/blob/c49cd08cb1cad7d526ca2c2a2b561f0dcfa7020d/src/device/DeviceBootloader.cpp#L691

    Response::GetBootloaderConfig resp; // doesn't initialize anything to any value
    receiveResponse(resp); // fail on first line of func and returns `false`
    if(resp.success) { // therefore `success` was never initialized and if() tests random garbage value

Expected

Initialize all member fields. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c41-a-constructor-should-create-a-fully-initialized-object

Fix

I have a PR forthcoming, fix for this specific case is easy.

However, there are multiple codepaths in which success is never initialized and therefore holds a random garbage value.
Which then leads to failures in branching.

Do you have an example of this codepath? The case mentioned below does receive the response and fills out this value. Or have you observed otherwise? (eg a response containing a garbage value?)

Yes, the three lines of code above in my OP.

Also, I would like to change the success and errorMsg pair.
success to a bool
and errorMsg to a string.
[EDIT] Ahhh, but maybe this struct is used jointly in your closedsource firmware. If that's true, then lets just leave them as int32/char*

My bad - reread the comment. In this case its not a garbage response (as in correct response that doesn't have the value set).
I'd move the receiveResponse check inside the if instead

if(receiveResponse(resp) && resp.success) {
    ...

And in (afaik 2) other places.

Also could memset/zeroinit the T response in receiveResponse and parseResponse (types are "PODs")

template <typename T>
bool DeviceBootloader::receiveResponse(T& response) {
    response = T{};
    if(stream == nullptr) return false;
    ...
}

and

template <typename T>
bool DeviceBootloader::parseResponse(const std::vector<uint8_t>& data, T& response) {
    response = T{};
    // Checks that 'data' is type T
    Response::Command command;
    ...
}

To not require making a release for bootloader

Unfortunately that won't completely work. I already checked. ๐Ÿ˜‰

Can't use response = {} or T{}. Because those problem classes have user-defined default constructors which do not initialize the member variables. Therefore the compiler constructed constructor that would do a value init is not generated.

If you don't want to make a firmware release "soon", then I still recommend I make a PR that does corrects the issue.
You can apply the fix on your next firmware release whenever that is.

In parallel, I make a PR in depthai-core that manually sets success=0 in all the problematic locations (its not that many).

Idea 2 - use a macro to know when the header is being compiled for host or the device. I feel like I've seen a define somewhere that does that. For example

struct BootApplication : BaseResponse {
        // Common
        BootApplication() : BaseResponse(BOOT_APPLICATION) {}

        // Data
#ifdef(HOST)
        uint32_t success = 0;
        char errorMsg[64] = {0};
#else
        uint32_t success;
        char errorMsg[64];
#endif

        static constexpr const char* VERSION = "0.0.14";
        static constexpr const char* NAME = "BootApplication";
    };

Then the device doesn't need new code, while the host gets the correction.

Sounds good on setting the success then / doing checks - other means would get less nice, quickly.
Feel free to do the PR here as well, we'll pull it in along some release sooner or later (in cases where its high priority, we can also do it quickly).

Regarding the host/device idea, possibility, but we currently require that shared commit matches on both core & BL (or FW), so it'd still need an update on the other end.

Got it. I will

  • make PR here to initialize the members. To be merged in an indefinite future release.
  • make PR at depthai-core to manually set the members in the 6 locations I have so far found

For the second point, I think checking the return value of the actual receiving/parsing might be better, where appropriate

@themarpe, is the underlying bootloader initialization fix released in firmware?
I ask so that I can remove the workaround in places like

https://github.com/luxonis/depthai-core/blob/5ead66c017593563c8f730847c3e7ed2df8e950c/src/device/DeviceBootloader.cpp#L1105-L1109

@diablodale

Yes, can be removed now, if the core is targeting the updated main (which it is), will have that in (& underlying BL was also released for it). Won't modify how previous version of BL's worked, but responses from coming FW will remain structured in same manner as before.

Hmmm. Is it possible for the old bootloader with this bug to be used with new depthai-core?
Since the bootloader is burned into the eeprom...we can not guarantee a customer will update their bootloader.

If that is all true, then I think it better to leave the workaround in place. I can make a PR to update the code comments to not remove it. Do you know which bootloader version includes the fix? I can also list that in the comments.

@diablodale

Sorry for the delay - rechecked - the BL will respond in the same manner as before, so incoming data will remain the same (therefore unaffected). The means of parsing the incoming data will be the same due to shared code change & core change essentially creating the same outcome - "If eg: Response::FlashComplete result; is constructed, in new way it'll have the fields nulled in the similar manner as the old way (just by constructor instead of explicitly)"
https://github.com/luxonis/depthai-core/pull/347/files

So LGTM to continue