microsoftarchive/http2-katana

https://http2katanatest.cloudapp.net:8443/ always returns the same response body regardless of the path

akalin-chromium opened this issue · 27 comments

For example:

https://http2katanatest.cloudapp.net:8443/
https://http2katanatest.cloudapp.net:8443/index.html and
https://http2katanatest.cloudapp.net:8443/foo

all return the same text:

10mbTest.txt<br>
Chalkboard HTML5 Benchmark.htm<br>
index.html<br>
Lite Brite Browser Performance Benchmark.htm<br>
LookAround.htm<br>
test.txt<br>

Not sure if this is intentional or not.

This is only inside same session, if you start a new one and ask for https://http2katanatest.cloudapp.net:8443/foo
server return headers with status 404

Not intentional - trying with Chrome, I see that it always sends headers as nonindexed, so they are appended to current headers list. Looking for the :path server takes first one hence same file (first requested) returns..

But non-indexed headers should not add to current headers list. 3.4 of the spec says:

  A literal representation adds a header to the current set of headers
   if the header is added to the header table (either by incremental
   indexing or by substitution indexing).

hm I see it here http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-01
but not here: https://github.com/yoavnir/compression-spec/blob/7f67f0dbecdbe65bc22f3e3b57e2d5adefeb08dd/compression-spec.txt

http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-01 says
For a literal representation, a new entry is added to the working set
representing this header. If the literal representation specifies
that the header is to be indexed, the header is added accordingly to
the header table, and its index is included in the entry in the
working set. Otherwise, the entry in the working set contains an
undefined index.

Hunh, you're right. I vaguely remember an e-mail mentioning we should work off the -01 spec and not the -00 one, but I forgot about it.

I'll ping the list to see who implemented which version.

Ah, but wait, there's this text further down in -01:

  The new reference set of headers is computed by removing from the
   working set all the headers that are not present in the header table.

Doesn't that imply that a literal non-indexed entry (which has undefined index) will then be removed from the reference set?

It does actually.

redeployed, something still wrong though. on replay Chrome gives SPDY_ERROR error_code: 5.

Hmm, I still see the original problem (same response body) when I hit the endpoint. But SPDY_ERROR 5 is SPDY_DECOMPRESS_FAILURE, which means there's most likely an error with the encoded response headers. If you run my modified Chromium from the command line, it will spit out a warning on the command line about why it failed.

Or I can look at it once I can reproduce it, but I can't seem to do that.

Redeployed once again, verified with Chromium that on second request it will reset session. Btw when I navigate to https://http2katanatest.cloudapp.net:8443/foo or any other non existen page - server send headers with 404 status. Chromium will not show any 404 warning and will apear to load page infinitely.

Ok, there seems to be a problem when you start navigating from https://http2katanatest.cloudapp.net:8443.
On subsequent request it wll return same index.html.
If you start with https://http2katanatest.cloudapp.net:8443/index.html or any other page than it will work correctly. Need to look at that.

Looks like there is a reason to that. :path / is in headers table (initially) and will not be removed from reference set since all headers from Chromium are unindexed.

Makes sense. So Chrome has an encoding problem? It should be sending an indexed rep. for 3 (:path / ) on the next request to remove it from the working set?

Note that Chrome still encodes everything with non-indexed literals, so there is effectively no reference set.

This is what I'm thinking:
":path /" is sent as a non-indexed literal. It is added to the working set, but the header table is unchanged. To generate the reference set, everything is removed from the working set that is not in the header table. Since ":path /" is in the header table by default, it is not removed, and added to the reference set.

The spec is ambiguous whether it's allowed to have a non-indexed literal for something that's already in the header table. But if it is allowed, then I think it still gets removed from the working set. The spec says:

 For a literal representation, a new entry is added to the working set
   representing this header.  If the literal representation specifies
   that the header is to be indexed, the header is added accordingly to
   the header table, and its index is included in the entry in the
   working set.  Otherwise, the entry in the working set contains an
   undefined index.

So for a non-indexed literal specifying ":path /", the tuple (undefined, ":path", "/") is added to the working set (the first field being the index), so it'll get removed when the new reference set is computed, even if (1, ":path", "/") exists in the header table.

   The new reference set of headers is computed by removing from the
   working set all the headers that are not present in the header table.

   It should be noted that during the decoding of the header
   representations, the same index may be associated to different
   headers in the working set and in the header table.

I think it is not clear that the index of the header in the working set has an effect on weather or not it is removed from the reference set.

Is 'header' defined? I was interpreting it as a key-value pair.

   First, upon starting the decoding of a new set of headers, the
   reference set of headers is interpreted into the working set of
   headers: for each header in the reference set, an entry is added to
   the working set, containing the header name, its value, and its
   current index in the header table.

This makes me think that it should only be saved if the index matches.

However

   Header names are always represented as lower-case strings.  An input
   header name matches the header name of a (name, value) pair stored in
   the Header Table if they are equal using a character-based, _case
   insensitive_ comparison.  An input header value matches the header
   value of a (name, value) pair stored in the Header Table if they are
   equal using a character-based, _case sensitive_ comparison.  An input
   header (name, value) pair matches a pair in the Header Table if both
   the name and value are matching as per above.

This makes me think that we should only match the name and value on computing the reference set.

Sorry that was 00, here is 01:

   An input header name matches the header name of a (name, value) pair
   stored in the Header Table if they are equal using a character-based,
   _case sensitive_ comparison.  An input header value matches the
   header value of a (name, value) pair stored in the Header Table if
   they are equal using a character-based, _case sensitive_ comparison.
   An input header (name, value) pair matches a pair in the Header Table
   if both the name and value are matching as per above.

Why the description of 'matches' here? Is there anywhere else where you need to determine if something matches or not?

Yeah, I agree, the spec is ambiguous. Perhaps raise an editorial issue on the http2 repo?

But I interpreted it as the working set is a list of (index, name, value) triplets, and not just (name, value) pairs, and when you convert the working set to the reference set, you just delete the ones with undefined index.

Btw it seems similar issue already exists: see http2-spec/issues/189

Yeah, I see. It may already be fixed in the repo: https://github.com/http2/compression-spec

I'm pretty sure the intent of the literal-without-indexing is to avoid modifying any state -- Roberto recommended to me to start with a bare-bones compressor that simply emitted everything as a literal-without-index.

Also, see this text in 3.2:

 Incremental and substitution indexing are optional.  If none of them
   is selected in a header representation, the header table is not
   updated.  In particular, no update happens on the header table when
   processing an indexed representation.

(I might be repeating myself at this point).

it was addressed in recent changes, see 3.2.1:

A literal representation that is not added to the header table entails the following action:
The header is emitted.

A literal representation that is added to the header table entails the following actions
The header is emitted.
The header is added to the header table, at the location defined by the representation.
The new entry is added to the reference set.