Lokathor/utf16_lit

Memory corruption when using the macro with `as_ptr()`

Closed this issue · 6 comments

copying example from rustsec/advisory-db#1147

#[inline(never)]
fn test_fn() -> *const u16 {
    utf16_lit::utf16_null!("SHA256").as_ptr()
}

#[inline(never)]
fn test_fn2() -> *const u16 {
    let a = utf16_lit::utf16_null!("SHA256");
    dbg!(&a);
    a.as_ptr()
}

utf16_null! returns an array, but it doesn't decide where it's allocated, which appears to be intended behavior (perhaps the docs could emphasize this more / give a warning). The following both allocate an array on the stack:

let array: [u16; 7] = utf16_lit::utf16_null!("SHA256");
let array: [u16; 7] = [83, 72, 65, 50, 53, 54, 0];

In test_fn, the array is const promoted, which by chance avoids issues, but in test_fn2, the array is on the stack so a dangling pointer is returned (https://rust.godbolt.org/z/s1ezfbxYe).

To avoid this issue, you can place the array behind a &'static in a const/static first, then use .as_ptr() on that.

static ARRAY: &[u16; 7] /* or &[u16] for slice */ = &utf16_lit::utf16_null!("SHA256");
ARRAY.as_ptr()

perhaps the docs could emphasize this more / give a warning

I think using statics inside the macro would make it more reliable than docs but I'm personally fine with both.

Edit: if using statics is the correct fix, haven't had time to really test it yet

I could imagine the macro automatically returning a reference like how byte string literals do with its arrays.

if using statics is the correct fix

It's not that it needs to be in a static item (that was one example), but the macros are currently documented to return [u16; N] rather than say &'static [u16; N], so the array needs to be alive somewhere while the pointer is in use.

edit: ah rereading, I see you were meaning if statics inside the macros are the right fix or not. I'm not a maintainer so I'll leave that up to lokathor.

Changing the final value from a const to a static will not fix the issue on its own:

  • If there's a static owned array returned from the macro (current), then that alone does not appear to fix things. The same bug happens.
  • If there's a static reference to an array returned from the macro (breaking change) then things end up being compatible with your usage.

Currently, the crate docs tell the user that an array is returned, and that you should apply a & if you want a slice, including a code block example. In other words, while macros don't have type signature this would effectively be a breaking type change to the crate.

Previously the crate returned &[u16], and the crate exists as it is right now because it was requested to be that way. One user wanted to have the arrays returned directly because they wanted to put data into Arc or Rc or something like that. We could have changed it to be &[u16; _] probably, but just didn't think of that at the time.

Thus, while a version 3 of the crate could potentially be released in a way that would let your scenario work, version 2 of the crate can't solve your problem. As dtolnay says in rustsec/advisory-db#1147 (comment), this is unfortunately a general problem with static/const data and stack frames, there's nothing specific to this crate that's gone wrong.

If this were my crate, I think I'd just close. The "signature" of returning [u16; N] is fine in my opinion, and the documentation in https://docs.rs/utf16_lit/2.0.2/utf16_lit/macro.utf16_null.html is clear enough that you get back a [u16; N]. The caller can put that into a static if they want, or an Arc, or const-promote to static, or whatever else they want.

Yeah, I guess there's nothing more we can track here.