eclecticiq/OpenTAXII

Poll file issues

ohliuw opened this issue · 11 comments

We are testing opentaxii and run in the following issue - it seems to change the < and the > inside the stixx package to &gt and &lt.

On the same linux box we run cabby and opentaxii.

  1. If I do a poll from cabby that is running on the same box, the output works fine:
> taxii-poll --path http://192.168.0.13:9000/services/poll-a --collection collection-b --username admin --password admin > test.xml
> 2020-05-25 13:22:37,590 INFO: Polling using data binding: ALL
> 2020-05-25 13:22:37,592 INFO: Sending Poll_Request to http://192.168.0.13:9000/services/poll-a
> 2020-05-25 13:22:38,494 INFO: 1 blocks polled
> less test.xml
> 
> 
> <stix:STIX_Package xmlns:XXXX="https://XXXXX" xmlns:DomainNameObj="http://cybox.mitre.org/objects#DomainNameObject-1" xmlns:EmailMessageObj="http://cybox.mitre.org/objects#EmailMessageObject-2" xmlns:FileObj="http://cybox.mitre.org/objects#FileObject-2" xmlns:HTTPSessionObj="http://cybox.mitre.org/objects#HTTPSessionObject-2" xmlns:LinkObj="http://cybox.mitre.org/objects#LinkObject-1" xmlns:URIObj="http://cybox.mitre.org/objects#URIObject-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:incident="http://stix.mitre.org/Incident-1" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="XXXXX:package-747b09a8-34cb-433c-98c2-0539017317c9" timestamp="2019-02-04T19:37:33+00:00" version="1.2">
>   <stix:STIX_Header>
>     <stix:Title>"XXXXXXX" block</stix:Title>
>     <stix:Information_Source>
>       <stixCommon:Identity>
> 
> 
  1. If I do a poll from our taxii client running on another box, we get this - this is from the packet capture on the linux box running opentaxxi:

> POST /services/poll-a HTTP/1.1
> X-TAXII-Content-Type: urn:taxii.mitre.org:message:xml:1.1
> X-TAXII-Protocol: urn:taxii.mitre.org:protocol:http:1.0
> X-TAXII-Services: urn:taxii.mitre.org:services:1.1
> Accept: application/xml
> Content-Type: application/xml
> authorization: Basic YWRtaW46YWRtaW4=
> Cache-Control: no-cache
> Pragma: no-cache
> User-Agent: Java/1.8.0_222
> Host: 10.4.16.160:9000
> Connection: keep-alive
> Content-Length: 294
> 
> 
> <Poll_Request xmlns="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" collection_name="collection-b" message_id="1">
>     <Exclusive_Begin_Timestamp>2020-05-01T15:12:23.000Z</Exclusive_Begin_Timestamp>
>     <Poll_Parameters/>
> </Poll_Request>
> HTTP/1.1 200 OK
> Server: gunicorn/20.0.4
> Date: Mon, 25 May 2020 16:04:04 GMT
> Connection: close
> Content-Type: application/xml
> Content-Length: 17440794
> X-TAXII-Content-Type: urn:taxii.mitre.org:message:xml:1.1
> X-TAXII-Protocol: urn:taxii.mitre.org:protocol:http:1.0
> X-TAXII-Services: urn:taxii.mitre.org:services:1.1
> 
> <taxii_11:Poll_Response xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" message_id="4384624944912034411" in_response_to="1" collection_name="collection-b" more="false" result_part_number="1">
>   <taxii_11:Exclusive_Begin_Timestamp>2020-05-01T15:12:23+00:00</taxii_11:Exclusive_Begin_Timestamp>
>   <taxii_11:Content_Block>
>     <taxii_11:Content_Binding binding_id="urn:stix.mitre.org:xml:1.1.1"/>
>     <taxii_11:Content>&lt;stix:STIX_Packagexmlns:XXXX="https://XXXXX"  xmlns:DomainNameObj="http://cybox.mitre.org/objects#DomainNameObject-1" xmlns:EmailMessageObj="http://cybox.mitre.org/objects#EmailMessageObject-2" xmlns:FileObj="http://cybox.mitre.org/objects#FileObject-2" xmlns:HTTPSessionObj="http://cybox.mitre.org/objects#HTTPSessionObject-2" xmlns:LinkObj="http://cybox.mitre.org/objects#LinkObject-1" xmlns:URIObj="http://cybox.mitre.org/objects#URIObject-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:incident="http://stix.mitre.org/Incident-1" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="XXXXX:package-747b09a8-34cb-433c-98c2-0539017317c9" timestamp="2019-02-04T19:37:33+00:00" version="1.2"&gt;
>   &lt;stix:STIX_Header&gt;
>     &lt;stix:Title&gt;"XXXXXXXX" block&lt;/stix:Title&gt;
>     &lt;stix:Information_Source&gt;
>       &lt;stixCommon:Identity&gt;

traut commented

that seems like a serialisation issue. Could you run Cabby command with -x -r flags and share raw xml printed to the console? This will show you what data you're getting from the server

so my other taxii client is not well implemented and doesn't conform with the standard? What is it that I have to ask them to fix?

> taxii-poll --path http://192.168.0.13:9000/services/poll-a --collection collection-a --username test --password test -x -r
> 
> 2020-05-25 15:29:40,326 INFO: Polling using data binding: ALL
> 2020-05-25 15:29:40,329 INFO: Sending Poll_Request to http://192.168.0.13:9000/services/poll-a
> <taxii_11:Content_Block xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1"><taxii_11:Content_Binding binding_id="urn:stix.mitre.org:xml:1.1.1"/><taxii_11:Content>&lt;stix:STIX_Package xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:simpleMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Simple-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:TOUMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Terms_Of_Use-1" xmlns:edge="http://soltra.com/" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:ttp="http://stix.mitre.org/TTP-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:opensource="http://www.hailataxii.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="edge:Package-7abdf984-6f51-44f1-a0db-b102e1bd4c3d" version="1.1.1" timestamp="2020-05-25T18:40:30.023134+00:00"&gt;
>     &lt;stix:STIX_Header&gt;
>         &lt;stix:Handling&gt;
>             &lt;marking:Marking&gt;
>                 &lt;marking:Controlled_Structure&gt;../../../../descendant-or-self::node()&lt;/marking:Controlled_Structure&gt;
>                 &lt;marking:Marking_Structure xsi:type="tlpMarking:TLPMarkingStructureType" color="WHITE"/&gt;
>                 &lt;marking:Marking_Structure xsi:type="TOUMarking:TermsOfUseMarkingStructureType"&gt;
>                     &lt;TOUMarking:Terms_Of_Use&gt;TBD&lt;/TOUMarking:Terms_Of_Use&gt;
>                 &lt;/marking:Marking_Structure&gt;
>                 &lt;marking:Marking_Structure xsi:type="simpleMarking:SimpleMarkingStructureType"&gt;
>                     &lt;simpleMarking:Statement&gt;Unclassified (Public)&lt;/simpleMarking:Statement&gt;
>                 &lt;/marking:Marking_Structure&gt;
>             &lt;/marking:Marking&gt;
>         &lt;/stix:Handling&gt;
>     &lt;/stix:STIX_Header&gt;
>     &lt;stix:Indicators&gt;
> 
traut commented

@ohliuw I haven't seen raw responses, but that would be my guess. It feels like (random guess) that it places STIX content in the TAXII block as text and not as XML tree structure, so all < and > get escaped, as they would be in text.

I heard back from the vendor. They claim that "the XML files are in UTF-16 format instead of UTF-8. "

Their product cant handle UTF-16; is there a way to force the output to UTF-8?

Thanks

traut commented

both OpenTAXII and libtaxii (opentaxii's dependency) use utf-8 while decoding / encoding content:

could you provide an anonymised stix file I can use for testing?

I am seeing the same issue as reported above.
steps to reproduce:

  • Set up opentaxii completely vanilla as described in the documentation...
  • Then can poll phishthank with cabby for example (for the last 3 IoCs) and put them into an xml file:
    taxii-poll --path http://hailataxii.com/taxii-discovery-service --collection guest.phishtank_com -l 3 > Haila.xml
  • Then push the phishtank content into the taxii server:
    taxii-push --path http://localhost:9000/services/inbox-a --dest collection-b --content-file Haila.xml --username admin --password admin
    Then I pull the taxii from a java application. the packet capture looks the same as above. this is before the application that is requesting the taxii file has a chance to modify anything with in the response.
    Thus the issue is with the opentaxii server and not the application trying to read from it.

@traut

@ohliuw I haven't seen raw responses, but that would be my guess. It feels like (random guess) that it places STIX content in the TAXII block as text and not as XML tree structure, so all < and > get escaped, as they would be in text.

so is there a way to force opentaxii not to escape the < and > as &lt and &gt and to send the data as XML? In the database they seem to be stored as XML (when cat the DB, it displays the < and >)

Also, Hailataxii doesn't do this?

  1. Get the Data
docker run \
   -a stdout \
   --rm eclecticiq/cabby taxii-poll \
   --path http://hailataxii.com/taxii-discovery-service --collection guest.phishtank_com -l 3 > Haila.xml
  1. Push the data into the TAXII server
docker run \
    --rm \
    --mount type=bind,source="$(pwd)",target=/tmp/mnt \
    --add-host host.docker.internal:host-gateway \
    eclecticiq/cabby \
    taxii-push \
    --path http://host.docker.internal:9000/services/inbox-a \
    --dest collection-b \
    --content-file /tmp/mnt/Haila.xml \
    --username admin --password admin
  1. Pull the data from TAXII server with a valid client (cabby)
docker run \
    --rm \
    --mount type=bind,source="$(pwd)",target=/tmp/mnt \
    --add-host host.docker.internal:host-gateway \
    eclecticiq/cabby \
    taxii-poll -x -r --path http://host.docker.internal:9000/services/poll-a\
     --collection collection-b \
     --username admin --password admin > output.txt

The &gt and &lt are present.

output.txt

Thus the issue is with the opentaxii server and not the application trying to read from it.

Yes

It is indeed a bug and we’d love to have it fixed, however it’s not a high priority for our team at the moment, so we can’t promise when it will get fixed. Still, we’re very open to external contributions - if you know how to fix this problem and you can open a PR with a fix, we will be very grateful.

This wasn't meant to be closed by #184

The content in opentaxii gets escaped when what it is sent is not valid xml, which makes it treat it like text and thus escaping it to embed it into an xml message. In the reproducing testcase by @eric-eclecticiq above, this is due to it sending 3 <STIX_Package> nodes in a single file, thus having 3 root nodes which isn't valid xml. This can be fixed by using the --dest-dir argument to taxii-poll and then calling taxii-push in a loop. The result is no escaped < and > in the output.

To illustrate, I've created an example script that does this and attached the output as well. I had to rename the example script to hailatest.txt, because github doesn't allow uploading .sh files. Please rename it after downloading.

hailatestoutput.txt
hailatest.txt

This usage pattern isn't clear from the cabby docs, so I'll update those instead.

I have created eclecticiq/cabby#83 for the documentation issue.

@ohliuw if you disagree with this assessment and can provide a minimal set of reproduction steps, I'd be happy to help work it out. Feel free to re-open the ticket if that is the case.