Kong/kong-js-pdk

Problem reading gzipped kong.service.response.get_raw_body()

Closed this issue · 11 comments

I have been trying to read the kong.service.response.get_raw_body() in a kong js plugin.
I'm able to read the data fine when it's not gzipped.
However, when the response is gzipped, I'm unable to gunzip.

const rawBodyCompressed = await kong.service.response.getRawBody();
const rawBodyUncompressed = zlib.gunzipSync(rawBodyCompressed);

I wrote a version of the plugin in lua and was able to perform the task successfully.

local rawBodyCompressed = kong.service.response.get_raw_body()
local rawBodyUncompressed = zlib.inflate()(rawBodyCompressed)

When I compared the base64 encodings of both the js and lua responses from kong.service.response.get_raw_body() I found they did not match. Which to me points to an issue with the pdk, but I'm not 100% sure.

If any assistance can be provided in trying to rectify this problem I'm experiencing, it would be greatly appreciated.

@dwoctor could you share with me the base64 encoded output of both lua and js plugin?

@fffonion, I have been testing against https://rickandmortyapi.com/api/character/1 with Accept-Encoding: gzip

In the javascript plugin, I used the following to get the base64:

const rawBodyCompressed = await kong.service.response.getRawBody();
const rawBodyCompressedBase64 = Buffer.from(rawBodyCompressed, "binary").toString("base64");

The base64 generated by the javascript plugin was as follows:

H/0IAAAAAAAAA/39T0v9MBgG/VJyUv2+Sf39/WQIXv39J/39Sxr9aP39Zv05/f1N/f03/UD9/dM3LT/9/ShcJUpKRd0V/Xh0/TN5/f00/Uv9YgocdlP99m5v/f1o/f1z/X79cR8r/TD9EzZ9ZX0cP3A7Zwf9asv9eG5+/T40/f12QWp9HQM7/f1qE/04/Vn9/f39V/39wUf9NEP9/XP9Dv39G/39/SkV/T9t/S5wZf1keE/93/39av39Ov39Bf1M3k39Pv09B/0ZLT9GW/1V/aH9PV79bf0T/dRe/f1AVgFZDWQLIP0C/Wsg/QH9N0D9ciT9/RFCR/0dIXj9/RH9R/0fIf39CEpEUEL9HiIoEUH9CEpEUCL9EhH9/f1EBBUi/RBBBX0+EUH9CCpEUCH9ChFU/f1CBDUi/RFB/Qhq/Q/9CGpEUCP9GhEI/UYEC0T9IP12/X79d/00/RtvOTj9FDL9/f1o/f1nlDoe/f0s/Rdx/Qb9/f04/QoAAA==

In the lua plugin, I used the following to get the base64:

local rawBodyCompressed = kong.service.response.get_raw_body()
local rawBodyCompressedBase64 = base64.encode(rawBodyCompressed)

The base64 generated by the lua plugin was as follows:

H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA==
>>> 'H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA==
KeyboardInterrupt
>>> a='H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA=='
>>> b='H/0IAAAAAAAAA/39T0v9MBgG/VJyUv2+Sf39/WQIXv39J/39Sxr9aP39Zv05/f1N/f03/UD9/dM3LT/9/ShcJUpKRd0V/Xh0/TN5/f00/Uv9YgocdlP99m5v/f1o/f1z/X79cR8r/TD9EzZ9ZX0cP3A7Zwf9asv9eG5+/T40/f12QWp9HQM7/f1qE/04/Vn9/f39V/39wUf9NEP9/XP9Dv39G/39/SkV/T9t/S5wZf1keE/93/39av39Ov39Bf1M3k39Pv09B/0ZLT9GW/1V/aH9PV79bf0T/dRe/f1AVgFZDWQLIP0C/Wsg/QH9N0D9ciT9/RFCR/0dIXj9/RH9R/0fIf39CEpEUEL9HiIoEUH9CEpEUCL9EhH9/f1EBBUi/RBBBX0+EUH9CCpEUCH9ChFU/f1CBDUi/RFB/Qhq/Q/9CGpEUCP9GhEI/UYEC0T9IP12/X79d/00/RtvOTj9FDL9/f1o/f1nlDoe/f0s/Rdx/Qb9/f04/QoAAA=='
>>> a.decode('base64')
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x95\xd6OK\xc30\x18\x06\xf0\xafRrR\xe8\xd6\xbeI\xba\xcd\xded\x08^\xbc\xa8\'\xc5\xc3K\x1a\xdbh\xff\x91f\x839\xf6\xddM\xc1\xa17\xf7@\xa1\xe9\xcb\x937-?\xda\xe6(\\%JJE\xcf\x9d\x15\xa5xt\xe63y\xe2\xde4\xf6K\xa4b\n\x1cvS\xac\xdf\xb6no\xe7\xc2h\x8d\xb3s\xe5~\xd7q\x1f+\xe10\xce\x13\xe3\xa8\xb6}e}\x1c?p;g\x07\xefj\xd7\x8b\xf2xn~\xc7>4\xc9\xd5vAj}\x1d\x03;\xdf\xc6j\x13\xc28\x95Y\xe6\xe3\xd2\xdcW\xdd\xe0\xc3\x81G\xb74C\x97\xc5s\xd6\x0e\x86\x83\x1b\xfa\x8c\xc4)\x15\xe7\xab?m\xb7.pe\xdbdxO\xe6\xdb\x9f\xf0\xcej\xee\xec:\xae\xed\x05\xb3L\xc3\x9eM\xb0>\xe3=\x07\xf6\x19-?F[\xc7U\xed\xe8\xa6\xa1\x8a=^\xffm\xf2\x13\x8d\xcf\x94^\x9c\x95@V\x01Y\rd\x0b \xbb\x02\xb2k \xbb\x01\xb27@\x96r$\x8c\xc8\x11BG\x88\x1d!x\x84\xe8\x11\xc2G\x88\x1f!\x80\x84\x08JDPB\xef\x1e"(\x11A\x89\x08JDP"\x82\x12\x11\x94\x88\xa0D\x04\x15"\xa8\x10A\x05}>\x11A\x85\x08*DP!\x82\n\x11T\x88\xa0B\x045"\xa8\x11A\x8d\x08j\xe8\x0f\x88\x08jDP#\x82\x1a\x11\xd4\x88\xa0F\x04\x0bD\xb0 \xf1v\xe9~\xebw\xe74\xb3\x1bo9\xd8\xb8\xfb\x142\xa7\xf5\x82h\x91\xebg\xda\x94:\x1e\xab\xa5,\xf2\x17q\xfa\x06\x89\xcc\xd28\x9f\n\x00\x00'
>>> b.decode('base64')
'\x1f\xfd\x08\x00\x00\x00\x00\x00\x00\x03\xfd\xfdOK\xfd0\x18\x06\xfdRrR\xfd\xbeI\xfd\xfd\xfdd\x08^\xfd\xfd\'\xfd\xfdK\x1a\xfdh\xfd\xfdf\xfd9\xfd\xfdM\xfd\xfd7\xfd@\xfd\xfd\xd37-?\xfd\xfd(\\%JJE\xdd\x15\xfdxt\xfd3y\xfd\xfd4\xfdK\xfdb\n\x1cvS\xfd\xf6no\xfd\xfdh\xfd\xfds\xfd~\xfdq\x1f+\xfd0\xfd\x136}e}\x1c?p;g\x07\xfdj\xcb\xfdxn~\xfd>4\xfd\xfdvAj}\x1d\x03;\xfd\xfdj\x13\xfd8\xfdY\xfd\xfd\xfd\xfdW\xfd\xfd\xc1G\xfd4C\xfd\xfds\xfd\x0e\xfd\xfd\x1b\xfd\xfd\xfd)\x15\xfd?m\xfd.pe\xfddxO\xfd\xdf\xfd\xfdj\xfd\xfd:\xfd\xfd\x05\xfdL\xdeM\xfd>\xfd=\x07\xfd\x19-?F[\xfdU\xfd\xa1\xfd=^\xfdm\xfd\x13\xfd\xd4^\xfd\xfd@V\x01Y\rd\x0b \xfd\x02\xfdk \xfd\x01\xfd7@\xfdr$\xfd\xfd\x11BG\xfd\x1d!x\xfd\xfd\x11\xfdG\xfd\x1f!\xfd\xfd\x08JDPB\xfd\x1e"(\x11A\xfd\x08JDP"\xfd\x12\x11\xfd\xfd\xfdD\x04\x15"\xfd\x10A\x05}>\x11A\xfd\x08*DP!\xfd\n\x11T\xfd\xfdB\x045"\xfd\x11A\xfd\x08j\xfd\x0f\xfd\x08jDP#\xfd\x1a\x11\x08\xfdF\x04\x0bD\xfd \xfdv\xfd~\xfdw\xfd4\xfd\x1bo98\xfd\x142\xfd\xfd\xfdh\xfd\xfdg\x94:\x1e\xfd\xfd,\xfd\x17q\xfd\x06\xfd\xfd\xfd8\xfd\n\x00\x00'

seeing lots of wrong characters become \xfd, looks like someone is treating it as unicode, let me try to find out which part is this happening.

@fffonion is there any news?

I've tested with python pdk with:

import kong_pdk.pdk.kong as kong

Schema = (
    { "message": { "type": "string" } },
)
version = '0.1.0'
priority = 0
class Plugin(object):
    def __init__(self, config):
        self.config = config
    def access(self, kong: kong.kong):
        a = kong.request.get_raw_body()
        kong.log(a)

and a random binary string as request body:

\x00\xff\xfa\xca\xbb\xcf

I got:

2022/03/29 15:44:41 [error] 10209#0: *4512 [kong] mp_rpc.lua:311 [test] no data, client: 127.0.0.1, server: kong, request: "GET / HTTP/1.1", host: "localhost:8000"

It doesn't seem related to gzip or js. Any binary request body could reproduce this behavior.

The behaviour I was looking at was on the service response side not the request.
But it's interesting you found problems on request request side in the case of the python pdk.
Do you think this is a common issue between the PDKs? i.e. javascript, python, go

The behaviour I was looking at was on the service response side not the request. But it's interesting you found problems on request request side in the case of the python pdk. Do you think this is a common issue between the PDKs? i.e. javascript, python, go

I have just gone through the code for a while. We are using msgpack for RPC, and its default encoding of strings is "string_compact". A reasonable guess is that, js and go interpreter all string as encoded by UTF-8 or something, and all RPC calls (implemented with msgpack) is influenced by this behavior.

So you can expect this behavior also appear in headers, get_ctx, ..., basically any binary data of PDK.

I will test go PDK later(it uses protobuf, not msgpack).

I've tested on go PDK and it also uses msgpack. However, it doesn't have the same problem. The reason seems to be that golang can handle arbitrary bytes in a string(just like what Lua does).

The return value type for get_raw_body(or GetRawBody) is str, string, string, and promise for python, lua, go and js respectively.

Lua and Golang's strings are sure to be capable to hold arbitrary binary data;
The documentation of Python states that strings are immutable sequences of Unicode code, so by definition, it can not handle this;
The documentation of Js says strings are arrays of 16-bit integers(and potentially utf16 encoded), therefore js also fails to represent arbitrary binary data.

This problem is due to the design of interface.

We have 2 solutions:

  1. When decoding returned value, we try to decode it as a textual string and only return a binary string when this fails. This way we preserve most compatibility;
  2. Just make a breaking change. We tag the exact type of return value. This is more elegant than 1, especially because the string return type is a mistake.

We decide to apply solution 2.

However, there are still decisions to make for 2:

  1. Technically headers content can be octets, which may not necessarily be decodable. Should we tag it as binary or not? (most of the time it should be a string)
  2. How do we tag compound types, E.g. get_headers returns table?

Anyway, we will migrate to protobuf, it will not affect too much. So we can simply leave those behaviors unchanged, except for fixing the raw_body.