Simple XML-like format parsing libarary and useful utilities written in awk(1) ============================================================================== Data format is specific to define traffic control rules for given network service provider subscriber (called just "user" for simplicity). This code however can be an example on how to parse XML or similar data using awk(1) from shell scripts. Document format --------------- Document contains multiple user definitions each of which defines interface user connected, set of IPv4/IPv6 networks/addresses assigned to user, zones with directions and allocated bandwidth. Zones, directions and bandwidths are grouped together within pipe data structure. Zones can be "local" for traffic belonging to local networks (e.g. country/city IXPs, peering connections, etc), "world" for rest of networks and "all" to treat everything equally. Directions can be "in" for incoming traffic to user (download), "out" for outgoing traffic from user (upload) or "all" to apply for both "in" and "out" directions. Bandwidth gives allocated bandwidth for direction in zone. It is specified in Kbit/s. Here is typical example of document format library is able to parse: <user WN2019011501> <pipe 1> <zone local> <dir all> <bw 102400Kb> </pipe> <pipe 2> <zone world> <dir in> <bw 10240Kb> </pipe> <pipe 3> <zone world> <dir out> <bw 5120Kb> </pipe> <if eth0.4094> <net 203.0.113.130/30> <src 203.0.113.129> <mac 02:11:22:33:44:55> </net> <net 192.0.2.128/30> <via 203.0.113.130> </net> </user> This document describes Library API ----------- There are three main API functions provided by libusrxml.awk module: BEGIN{ # Prepare parser by initializing internal variables. This should be # very first routine called from the library. # Usually called from BEGIN{} awk(1) program section. h = init_usrxml_parser("program name"); if (h < 0) exit 1; } { # Takes single line of data and parses it to internal data structures. # Usually called from main {} awk(1) program section. if (run_usrxml_parser(h, line) < 0) exit 1; } END{ # Destroy parser data structures for given handle @. # Usually called from END{} awk(1) program section. if (fini_usrxml_parser(h) < 0) exit 1; } All these functions return zero or userid + 1 on success or less than zero on error. One can use usrxml_errno() to get integer value representing error code and special constant-like defines from init_usrxml_consts() to determine exact reason. See libusrxml.awk init_usrxml_parser() function for USRXML schema to associative arrays elements mapping that is available after user parsed. There is two additional functions defined in the library to print <user> that can be used only with successfuly parsed document: # Use friendly output format with each <tag> placed on their # own newline and tabs for indendation. if (print_usrxml_entry(h, userid) < 0) exit 1; # Put everything on single line. This format is useful for machine # processing (e.g. search with grep(1)). if (print_usrxml_entry_oneline(h, userid) < 0) exit 1; These functions can be used as run_usrxml_parser() callback to print user entry once it is ready: if (run_usrxml_parser(h, line, "print_usrxml_entry") < 0) exit 1; Examples -------- There is users_xml2lst.awk and users_lst2xml.awk binaries that parse document and output it with each <user> tag on single line or normal format. Assuming you have installed these scripts under /netctl/bin you can use following command sequence to verify document correctness: $ cat >/tmp/usr.xml <<EOF <user WN2019011502> <pipe 1> <zone world> <dir all> <bw 20480Kb> </pipe> <if eth0.4094> <net 203.0.113.97/32> <src 203.0.113.1> <mac 02:11:22:33:44:55> </net> </user> EOF $ /netctl/bin/users_xml2lst.awk /tmp/usr.xml | \ /netctl/bin/users_lst2xml.awk These tools can also be used for document validation from shell.
serhepopovych/libusrxml
Simple XML-like format parsing library and useful utilities written in awk
AwkMIT