OISF/libhtp

Memory leak

Closed this issue · 8 comments

In testing libhtp with Suricata on live traffic in valgrind, I am seeing this leak report:

==31733== 68 bytes in 2 blocks are definitely lost in loss record 274 of 556
==31733==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31733==    by 0x4E39F9C: bstr_alloc (bstr.c:44)
==31733==    by 0x4E3A348: bstr_dup_ex (bstr.c:243)
==31733==    by 0x4E3A658: bstr_dup_lower (bstr.c:251)
==31733==    by 0x4E41C8B: htp_connp_RES_BODY_DETERMINE (htp_response.c:537)
==31733==    by 0x4E428BF: htp_connp_res_data (htp_response.c:980)
==31733==    by 0x437042: HTPHandleResponseData (app-layer-htp.c:799)
==31733==    by 0x444BAD: AppLayerParserParse (app-layer-parser.c:780)
==31733==    by 0x4161FA: AppLayerHandleTCPData (app-layer.c:266)
==31733==    by 0x5B601F: StreamTcpReassembleAppLayer (stream-tcp-reassemble.c:3027)
==31733==    by 0x5B68D4: StreamTcpReassembleHandleSegmentUpdateACK (stream-tcp-reassemble.c:3373)
==31733==    by 0x5B6976: StreamTcpReassembleHandleSegment (stream-tcp-reassemble.c:3401)

At htp_response.c:537, we have:

connp->out_tx->response_content_type = bstr_dup_lower(ct->value);

It appears that there is a case where this line is run even though the response_content_type is not NULL. In other words, the pointer is overwritten and the reference to the original memory is lost.

Adding this check:

if (connp->out_tx->response_content_type == NULL) {

made the leak report go away. The entire block then looks like:

            if (connp->out_tx->response_content_type == NULL) {
                connp->out_tx->response_content_type = bstr_dup_lower(ct->value);
                if (connp->out_tx->response_content_type == NULL) return HTP_ERROR;

                // Ignore parameters
                unsigned char *data = bstr_ptr(connp->out_tx->response_content_type);
                size_t len = bstr_len(ct->value);
                size_t newlen = 0;
                while (newlen < len) {
                    // TODO Some platforms may do things differently here.
                    if (htp_is_space(data[newlen]) || (data[newlen] == ';')) {
                        bstr_adjust_len(connp->out_tx->response_content_type, newlen);
                        break;
                    }

                    newlen++;
                }
            }

However, it's not clear to me if this is the proper way of handling this.

I've only been able to 'reproduce' this on live traffic in valgrind. Because valgrind is running, the box is severely overloaded, causing lots of packet loss. This leads to many sessions that are incomplete.

Perhaps a more correct way to fix this leak is to free htp_tx_t::response_content_type if it is not NULL. Otherwise you might continue to use some value that is not valid any more.

But I am having trouble locating the root cause for this. This function should be invoked only once per transaction, when we're determining if there is a response body. If allocated, response_content_type is later freed when the transaction is destroyed. I can't find any code paths that would result with a memory leak.

I think I'm dealing with a case where we mix flows from multiple vlans. So this would be invalid input to libhtp. Not sure though how that would result in this condition. However, the invalid data could be faked on the wire as well, so I'm not convinced it's harmless even if it's triggered by our vlan issues.

Yes, it shouldn't happen even with mixed flows. Invalid input should result with an error. What I'd like to do here, at least for now, is raise an error if the content type is already allocated. That way, we may learn more about the root cause. Additionally, could you run LibHTP in debug mode (heavy logging) to try to catch the problem?

I've only been able to reproduce this on saturated 10gig links, so logging will be... hard. Might be better in my case to just abort() and inspect the core/backtrace.

Please try that. It might not be very useful because of the event-driven nature of the parsing. The problem might be spanning packets. A possible alternative might be to buffer all connection data, and only dump it to the log on error. This will increase the memory requirements, but shouldn't otherwise interfere.

I have a small pcap that can recreate this issue when using Suricata. I don't have a convenient place to upload it though. I'd happily email it to either of you if that would help.

That's great. Please email me at ivanr at webkreator.com. Thanks!

This leak was in Suricata; it's now fixed:
OISF/suricata@0416a84