Panic on some websites with async-h1 backend: "String slice should be valid ASCII"
Shnatsel opened this issue ยท 3 comments
On some websites, e.g. http://futureuae.com, surf panics fails with the following error:
thread 'main' panicked at 'called
Result::unwrap()
on anErr
value: String slice should be valid ASCII
Firefox and curl work fine.
463 websites out of the top million from Feb 3 Tranco list are affected.
Tested using this code. Test tool output from all affected websites: surf-should-be-valid-ascii.tar.gz
The root cause might be in some dependency and not in surf itself. But the backtrace is useless due to async, and the error message doesn't even point to the line where the panic happens, so I'm reporting it here.
I've only tested the async-h1 backend; I don't know if the other bakends are affected.
Here's a backtrace:
RUST_BACKTRACE=1 longboard GET https://futureuae.com
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: String slice should be valid ASCII', /home/jeremiah/.cargo/registry/src/github.com-1ecc6299db9ec823/http-types-2.10.0/src/headers/headers.rs:54:62
stack backtrace:
0: rust_begin_unwind
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/panicking.rs:92:14
2: core::option::expect_none_failed
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/option.rs:1268:5
3: http_types::headers::headers::Headers::insert
4: http_types::headers::headers::Headers::append
5: <http_client::h1::H1Client as http_client::HttpClient>::send::__send::{{closure}}
6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
10: std::thread::local::LocalKey<T>::with
11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
12: async_io::driver::block_on
13: tokio::runtime::context::enter
14: tokio::runtime::handle::Handle::enter
15: std::thread::local::LocalKey<T>::with
16: std::thread::local::LocalKey<T>::with
17: async_std::task::builder::Builder::blocking
18: longboard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
As I suspected it seems this website is sending a header with invalid ASCII. http-types does not allow non-ASCII in headers as the HTTP spec also recommends not allowing invalid non-ASCII as of RFC 7230: https://tools.ietf.org/html/rfc7230#section-3.2.4
Quoting stackoverflow
In short: Only ASCII is guaranteed to work. Some non-ASCII bytes are allowed for backwards compatibility, but are not supposed to be displayable.
HTTPbis gave up and specified that in the headers there is no useful encoding besides ASCII:
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.
The website needs to fix this and be spec compliant. We are unlikely to support this.
I suppose this should not panic, but it should be an error.
I ran into this issue myself today.
Patching http-types
to not check for ASCII validity made it work for me. Obviously, this is a bad hack and no good solution. I'll have to do more research how this bad response header value is generated, but I am not sure if I will be able to fix this server-side eventually. In my case I do not care about the header value at all. Could it make sense for surf to work with Result<HeaderValue, _>
or Option<HeaderValue>
, so I can just choose to ignore the invalid header value that I am not interested in but still be able to work with the body of the response?
What do you think?
Edit: After all, I managed to fix it on the server side. Now the hard part, getting it merged upstream: MDL-60697