paritytech/parity-scale-codec

Stack overflow when decoding large boxed arrays

Closed this issue · 14 comments

Example:

#![feature(new_uninit)]
use parity_scale_codec::{Decode, Encode};

#[derive(Encode, Decode)]
struct S(Box<[u8; 100 * 1024 * 1024]>);

impl Default for S {
    fn default() -> Self {
        Self(unsafe { Box::new_zeroed().assume_init() })
    }
}

fn main() {
    let s = S::default();
    let encoded = s.encode();
    println!("Encoded successfully");
    S::decode(&mut encoded.as_slice()).unwrap();
}

Output:

Encoded successfully

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run --example overflow' terminated by signal SIGABRT (Abort)

Likely due to rust-lang/rust#53827

This manual implementation works as a workaround, but it is slow:

#![feature(new_uninit)]

use parity_scale_codec::{Decode, Encode, Input};
use std::mem::ManuallyDrop;

#[derive(Encode)]
struct S(Box<[u8; 100 * 1024 * 1024]>);

impl Decode for S {
    fn decode<I: Input>(input: &mut I) -> Result<Self, parity_scale_codec::Error> {
        let piece = parity_scale_codec::decode_vec_with_len::<u8, _>(input, 100 * 1024 * 1024)
            .map_err(|error| error.chain("Could not decode `S.0`"))?;
        let mut piece = ManuallyDrop::new(piece);
        // SAFETY: Original memory is not dropped and guaranteed to be allocated
        let piece = unsafe { Box::from_raw(piece.as_mut_ptr() as *mut [u8; 100 * 1024 * 1024]) };
        Ok(S(piece))
    }
}

impl Default for S {
    fn default() -> Self {
        Self(unsafe { Box::new_zeroed().assume_init() })
    }
}

fn main() {
    let s = S::default();
    let encoded = s.encode();
    println!("Encoded successfully");
    S::decode(&mut encoded.as_slice()).unwrap();
}

Minimized example on stable rust (in case anyone doubts the unsafe / nightly):

use parity_scale_codec::Decode;

fn main() {
    let data = &[];
    let _ = Box::<[u8; 100 * 1024 * 1024]>::decode(&mut data.as_slice());
}
koute commented

Hmm.... this seems a little tricky to fix as-is; this is the problem code:

impl<T, X> Decode for X where
        T: Decode + Into<X>,
        X: WrapperTypeDecode<Wrapped=T>,
{
        fn decode<I: Input>(input: &mut I) -> Result<Self, Error> {
                input.descend_ref()?;
                let result = Ok(T::decode(input)?.into());
                input.ascend_ref();
                result
        }

}

impl<T> WrapperTypeDecode for Box<T> {
        type Wrapped = T;
}

So it calls decode() on the array type and then into()s into the Box, which goes through the stack, which explodes if the array's too big.

We'll probably have to refactor parity-scale-codec a little to allow decoding into &mut MaybeUninit<T> so that the Box with the underlying memory could be preallocated beforehand, or something along these lines.

I don't think the issue is fixed fully. I have upgraded to 3.6.1 and it my app still crashes with stack overflow.

Here is example that is a tiny bit more advanced than before (this is what I actually use in the app), it uses new type:

use parity_scale_codec::Decode;

#[derive(Decode)]
#[repr(transparent)]
struct NewType([u8; 100 * 1024 * 1024]);

#[derive(Decode)]
struct S(Box<NewType>);

fn main() {
    let data = &[];
    S::decode(&mut data.as_slice()).unwrap();
}
ggwpez commented

I dont think this^ can be fixed, or @koute ?

The biggest problem is that it compiled, but crashes in runtime. If it didn't compile it would have been easier to work with.

ggwpez commented

Hm not sure what could be done about this. We could add some MaxStackSize(u32) to SCALE, so that it can estimate its own memory usage and cause a compile error if it would easily be reached.
But that does not give any actual guarantees (unless we pair it with a MaxNested(u32) attribute...

Or we swap to lazy allocations when the estimated size is above some customizable limit? All just ideas, no idea what is best or achivable.

bkchr commented

I think we can fix it. If we have a new type wrapper like type (one unnamed/named field) and it has the repr(transparent), we can delegate decode_into to the wrapped type. This should fix this issue here.

bkchr commented

And BTW, it doesn't fail with a stack overflow for me. I get a segmentation fault, because the inputs to decode_into seem to be null/wrong 🤔

It seems to on Linux:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run --bin codec' terminated by signal SIGABRT (Abort)
bkchr commented

I'm also on Linux, but I get:

(signal: 11, SIGSEGV: invalid memory reference)

Looking into it with gdb, I get the following:

[Switching to Thread 0xfffff7dbf0e0 (LWP 540650)]
0x0000aaaaaaae8f98 in parity_scale_codec::codec::Decode::decode_into<mod::NewType, &[u8]> (input=0x0, 
    dst=<error reading variable: Cannot access memory at address 0xffffcc1bd5f0>) at src/codec.rs:308

That input is a null pointer, doesn't make any sense..

koute commented

Yeah, this is kinda expected, as the Derive macro generates the following code here:

#[repr(transparent)]
struct NewType([u8; 100 * 1024 * 1024]);

struct S(Box<NewType>);

#[allow(deprecated)]
const _: () = {
    #[automatically_derived]
    impl ::parity_scale_codec::Decode for NewType {
        fn decode<__CodecInputEdqy: ::parity_scale_codec::Input>(
            __codec_input_edqy: &mut __CodecInputEdqy,
        ) -> ::core::result::Result<Self, ::parity_scale_codec::Error> {
            ::core::result::Result::Ok(
                NewType({
                    let __codec_res_edqy = <[u8; 100 * 1024
                        * 1024] as ::parity_scale_codec::Decode>::decode(
                        __codec_input_edqy,
                    );
                    match __codec_res_edqy {
                        ::core::result::Result::Err(e) => {
                            return ::core::result::Result::Err(
                                e.chain("Could not decode `NewType.0`"),
                            );
                        }
                        ::core::result::Result::Ok(__codec_res_edqy) => __codec_res_edqy,
                    }
                }),
            )
        }
    }
};

#[allow(deprecated)]
const _: () = {
    #[automatically_derived]
    impl ::parity_scale_codec::Decode for S {
        fn decode<__CodecInputEdqy: ::parity_scale_codec::Input>(
            __codec_input_edqy: &mut __CodecInputEdqy,
        ) -> ::core::result::Result<Self, ::parity_scale_codec::Error> {
            ::core::result::Result::Ok(
                S({
                    let __codec_res_edqy = <Box<
                        NewType,
                    > as ::parity_scale_codec::Decode>::decode(__codec_input_edqy);
                    match __codec_res_edqy {
                        ::core::result::Result::Err(e) => {
                            return ::core::result::Result::Err(
                                e.chain("Could not decode `S.0`"),
                            );
                        }
                        ::core::result::Result::Ok(__codec_res_edqy) => __codec_res_edqy,
                    }
                }),
            )
        }
    }
};

As you can see the decode tries to decode the array on the stack here.

For this not to crash we need to also generate a decode_into passthrough.

I'll fix this.

koute commented

And BTW, it doesn't fail with a stack overflow for me. I get a segmentation fault, because the inputs to decode_into seem to be null/wrong thinking

That's probably due to corrupted stack. (See the other argument where you're getting error reading variable: Cannot access memory at address 0xffffcc1bd5f0.) You have to remember that GDB uses whatever is on the stack, and if your stack is busted then it might just give you total garbage.

koute commented

PR with a fix to also handle newtypes: #462