oremanj/python-netfilterqueue

Packet.drop_refs() and self.payload pointer

DOWRIGHTTV opened this issue · 3 comments

packet.set_nfq_data(nfqueue, nfa)
try:
    user_callback(packet)
finally:
    packet.drop_refs()
return 1

the drop_refs call is setting a pointer to null, but doesnt touch the actual memory allocated by libnetfilter_queue. the pointer will also be garbage collected once it Packet is no longer referenced. I feel this should be removed, but if it needs to be there for something i am not seeing, then it can be moved to a dealloc method and done automatically. (both of these get messy though if python kills it before you get to it)

self.payload_len = nfq_get_payload(nfa, &self.payload)
if self.payload_len < 0:
    # Probably using a mode that doesn't provide the payload
    self.payload = NULL
    self.payload_len = 0

this has a similar issue. i would expect this code would never be reached , but if so it doesnt need to reset the payload pointer. nfq_get_payload returning -1 is due to an error, so it would probably be more fitting to raise that up or exit since it is likely a fatal condition.

I appreciate the attention to detail here, but the API in question doesn't work like you think it does.

the pointer will also be garbage collected once it Packet is no longer referenced

This is not true: garbage collection only deals with Python objects; anything that libnetfilter_queue allocated for us would need to be freed either by directly calling free() or by calling back into some libnetfilter_queue function that would free it. (But since there's no allocation happening here, there's nothing that needs to be freed either.)

if it needs to be there for something i am not seeing, then it can be moved to a dealloc method and done automatically

That would not be appropriate in this case because the whole point of drop_refs() is to provide protection when the Packet object is saved for use from Python after the callback returns. At that point the buffer into which payload points has potentially been reused for another packet; it's better to indicate that the payload is no longer available than to silently return incorrect data. A __dealloc__ would not be called until all references from Python were dropped, which is too late for this purpose.

nfq_get_payload returning -1 is due to an error, so it would probably be more fitting to raise that up or exit since it is likely a fatal condition.

Again consulting the libnetfilter_queue source code that I linked above, nfq_get_payload() will return -1 if the message received from the kernel has no NFQA_PAYLOAD attribute. This can happen in non-erroneous cases such as if only packet metadata is being captured.

you are literally saying what i am saying. the buffer is handled at a lower level when calling into handle_packet(). there is no reason to set a pointer to NULL when you are done with it OR you would put it under dealloc.

you are not doing self.payload[0] so this call is only reassigning the pointer. either way, both of them are unnecessary.

the Packet class is a C extension for Python. self.payload is allocated by Cython via the C/Python API and reference counts are handled accordingly. that is the purpose of a cdef class in Cython.

maybe you should look into the work i have done with this. https://github.com/DOWRIGHTTV/dnxfirewall/tree/work-in-progress/dnx_secmods/cfirewall/fw_main

as an edit: most of these problems are from trying to make single function, single use data, storable and looked into. for that, it would be better to make a copy of the data per instance of Packet, where the data wont just become invalid OR provide the data via returns and let the user/caller store they locally in their code. this is inspecting data in real time to make real-time decisions on said data.

what is the purpose to hold on to the originally referenced data? for anyone wanting to store the data, it should be on them to ensure they properly copy the data. this is just how C works.

there is no reason to set a pointer to NULL when you are done with it

Yes there is -- it acts as a signal that the payload is no longer valid to access, so that other code (such as get_payload()) can throw an exception rather than access invalid memory.

self.payload is allocated by Cython via the C/Python API and reference counts are handled accordingly. that is the purpose of a cdef class in Cython.

Uh... the payload attribute is a raw unsigned char*, as you can see in the .pxd file. It turns into a struct member in the generated C code. It does not point to a Python object so there is no reference counting involved.

what is the purpose to hold on to the originally referenced data? for anyone wanting to store the data, it should be on them to ensure they properly copy the data. this is just how C works.

The goal is to provide a reasonably user-friendly and Pythonic interface without too much performance impact. Copying unconditionally would penalize users that are making short-term decisions directly within their callback. Allowing the pointer to dangle creates a bad UX for users that want to have the callback just stash the packet somewhere that will be looked at later. It should not be possible to cause a segfault even if the library is not used as intended; that's a bad user experience.

This is a Python library; "how C works" is relevant to the implementation, but not to the interface. If you want a minimal wrapper around libnetfilter_queue you can open it up using ctypes.CDLL and call its functions directly. I'm trying to make something that is usable by people who aren't already familiar with the underlying C library.