libprotoident 2.0.10 --------------------------------------------------------------------------- Copyright (c) 2011-2016 The University of Waikato, Hamilton, New Zealand. All rights reserved. This code has been developed by the University of Waikato WAND research group. For further information please see http://www.wand.net.nz/. --------------------------------------------------------------------------- See the file COPYING and COPYING.LESSER for full licensing details for this software. Report and bugs, questions or comments to contact@wand.net.nz NEW: You can now lodge bugs by filing an issue on the libprotoident github: https://github.com/wanduow/libprotoident Authors: Shane Alcock With contributions from: Donald Neal Aaron Murrihy Paweł Foremski <pjf@iitis.pl> Fabian Weisshaar <elnappo@nerdpol.io> Introduction ============ Libprotoident is a library designed to perform application protocol identification using a very limited form of deep packet inspection, i.e. using the first four bytes of application payload sent in each direction. The library provides a simple API that will enable programmers to develop their own tools that utilise application protocol information and we have also included some tools that can be used to perform simple analysis of traffic flows. Required Libraries ================== libtrace * available from http://research.wand.net.nz/software/libtrace.php libflowmanager 2.0.4 or later * optional, but required to build the tools * available from http://research.wand.net.nz/software/libflowmanager.php Installation ============ After having installed the required libraries, running the following series of commands should install libprotoident ./bootstrap.sh (only if you've cloned the source from GitHub) ./configure make make install By default, libprotoident installs to /usr/local - this can be changed by appending the --prefix=<new location> option to ./configure. The libprotoident tools are built by default - this can be changed by using the --with-tools=no option with ./configure. Protocols Supported =================== A full list of supported protocols can be found at http://wand.net.nz/trac/libprotoident/wiki/SupportedProtocols Libprotoident also currently has rules for several "mystery" protocols. These are patterns that commonly occur in our trace sets that we cannot tie to an actual protocol. It would be nice to know what these protocols actually are - if you have any suggestions please feel free to email us at contact@wand.net.nz. In addition, a flow can be assigned into a "category" based on the protocol determined by libprotoident, enabling broader analysis. For example, BitTorrent, Gnutella and eMule all fall into the P2P category, whereas SMTP, POP3 and IMAP are part of the Mail category. Tools ===== There are currently four tools included with libprotoident. * lpi_protoident Description: This tool attempts to identify each individual flow within the provided trace. Identification only occurs when the flow has concluded or expired, so it is not very effective for real-time applications. Usage: lpi_protoident <input trace URI> The input trace must be a valid libtrace URI. Output: For each flow in the input trace, a single line is printed to stdout describing the flow. The line contains the following fields separated by spaces (in order): * Application protocol (as reported by libprotoident) * IP address of the first endpoint * IP address of the second endpoint * Port used by the first endpoint * Port used by the second endpoint * Transport protocol (6 = TCP, 17 = UDP) * Unix timestamp when the flow began * Unix timestamp when the flow ended * Total bytes sent from first endpoint to second endpoint * Total bytes sent from second endpoint to first endpoint * First four bytes of payload sent from first endpoint (in hex) * First four bytes of payload sent from first endpoint (ASCII) * Size of first payload-bearing packet sent from first endpoint * First four bytes of payload sent from second endpoint (in hex) * First four bytes of payload sent from second endpoint (ASCII) * Size of first payload-bearing packet sent from second endpoint * lpi_find_unknown Description: This tool reports all the flows in a trace which libprotoident was unable to identify. Identification only occurs when the flow has concluded or expired, so it is not very effective for real-time applications. This is mainly intended as a tool to aid development of new protocol identifiers. Usage: lpi_find_unknown <input trace URI> The input trace must be a valid libtrace URI. Output: For each unknown flow in the input trace, a single line is printed to stdout describing the flow. The line contains the following fields separated by spaces (in order): * IP address of the first endpoint * IP address of the second endpoint * Port used by the first endpoint * Port used by the second endpoint * Transport protocol (6 = TCP, 17 = UDP) * Unix timestamp when the flow began * Total bytes sent from first endpoint to second endpoint * Total bytes sent from second endpoint to first endpoint * First four bytes of payload sent from first endpoint (in hex) * First four bytes of payload sent from first endpoint (ASCII) * Size of first payload-bearing packet sent from first endpoint * First four bytes of payload sent from second endpoint (in hex) * First four bytes of payload sent from second endpoint (ASCII) * Size of first payload-bearing packet sent from second endpoint * lpi_arff Description: This tool is similar to lpi_protoident except that it writes its output in the ARFF format so that it is compatible with the Weka machine learning software (http://www.cs.waikato.ac.nz/ml/weka/). This tool was contributed by Paweł Foremski <pjf@iitis.pl>. Usage: lpi_arff <input trace URI> The input trace must be a valid libtrace URI. Output: The output begins with a series of lines describing each feature that will be used to describe each flow. Following that, for each flow in the input trace, a single line is printed to stdout describing the flow. The line contains the following fields separated by commas (in order): * Application protocol (as reported by libprotoident) * ID number for the application protocol * Total number of packets sent from first endpoint to second endpoint * Total number of bytes sent from first endpoint to second endpoint * Total number of packets sent from second endpoint to first endpoint * Total number of bytes sent from second endpoint to first endpoint * Minimum payload size sent from first endpoint to second endpoint * Mean payload size sent from first endpoint to second endpoint * Maximum payload size sent from first endpoint to second endpoint * Standard deviation of payload size sent from first endpoint to second endpoint * Minimum payload size sent from second endpoint to first endpoint * Mean payload size sent from second endpoint to first endpoint * Maximum payload size sent from second endpoint to first endpoint * Standard deviation of payload size sent from second endpoint to first endpoint * Minimum packet interarrival time for packets sent from first endpoint to second endpoint * Mean packet interarrival time for packets sent from first endpoint to second endpoint * Maximum packet interarrival time for packets sent from first endpoint to second endpoint * Standard deviation of packet interarrival time for packets sent from first endpoint to second endpoint * Minimum packet interarrival time for packets sent from second endpoint to first endpoint * Mean packet interarrival time for packets sent from second endpoint to first endpoint * Maximum packet interarrival time for packets sent from second endpoint to first endpoint * Standard deviation of packet interarrival time for packets sent from second endpoint to first endpoint * Flow duration (in microseconds) * Flow start time (as a Unix timestamp) * lpi_live (DEPRECATED) Description: This tool reports byte and packet counts (both inbound and outbound) for each identified protocol in real-time. Identification of a flow occurs as soon as possible, so that the statistics reported are as up-to-date as possible. lpi_live has been deprecated and will not be built by default. The code is still available in our git repository, but we will not update or support this tool anymore. Instead, please use the lpicollector (https://github.com/wanduow/lpicollector) for real-time flow analysis with libprotoident. In combination with the included lpi.py example client, lpicollector can produce output similar to that produced by lpi_live. Usage: lpi_live <options> <input URI> The input URI must be a valid libtrace URI. Options: -f <filterstring> : Specifies a BPF filter to be applied to the input. -R : Ignore traffic to or from RFC1918 private addresses. -i <secs>: Specifies the reporting interval (in seconds). -T : Use trace direction tags to determine direction. -l <mac> : Determine direction based on the given mac address. The mac should represent the 'inside' or 'local' side of the network. -m <id> : Use the given id string to identify the monitor rather than $HOSTNAME. Output: IMPORTANT: lpi_live output has changed significantly in version 2.0.5! Reports are written to stdout. Each line of output is a series of comma separated values representing a single measurement. The values are (in order): * The monitor id * The Unix timestamp for the beginning of the measurement period * The length of the measurement period * The statistic being reported - this is one of the following: - in_pkts = inbound packets - out_pkts = outbound packets - in_bytes = inbound bytes (based on wire length) - out_bytes = outbound bytes (based on wire length) - in_new_flows = new inbound flows - out_new_flows = new outbound flows - in_curr_flows = inbound flows active at the period end - out_curr_flows = outbound flows active at the period end * The application protocol being measured * The value for the measured statistic API === If you want to develop your own tools based on libprotoident, you'll need to use the libprotoident API. The API is very simple and the best way to learn it is to examine how the existing tools work. The source for the tools is located in the tools/ directory. The tools use libflowmanager to do the flow tracking, using functions beginning with 'lfm_'. You will probably want to incorporate this into your own tool. Usage of libprotoident itself is through functions beginning with 'lpi_'. The libprotoident API functions themselves are documented in lib/libprotoident.h if you need further guidance. Further documentation of the API can also be found at http://wand.net.nz/trac/libprotoident/wiki/DeveloperDocs If all else fails, drop us a line at contact@wand.net.nz.
woylaski/libprotoident
Network traffic classification library that requires minimal application payload
C++LGPL-3.0