Tracking data flows for objects

Question

Tracking data flows for objects

quisacrc opened this issue 3 years ago · 8 comments

Hi,

I have a question about ODGen's data flow ability. If the same JS code is as follows:

function exploit(string, input, val){
  var inner = string + "123";
    var link = inner + "123";
   eval(link);
}
exploit(1,2,3);
module.exports = {
  exploit};

ODgen successfully tracks the data flow as well as the vulnerability of os_command. However, if the JS code is as follows:

function exploit(string, input, val){
  var inner = string + "123";
    var link = inner + "123";
   eval({hello: link});
}
exploit(1,2,3);
module.exports = {
  exploit};

ODGen does not track its data flow. Could you give me a clue on how to track data flows for objects (e.g., {hello: link} )? Thanks.

Answer 1 · 2022-01-12T18:51:07.000Z

Hi,

This case is very interesting. ODGen uses object-level taint marking so in your second case, the "link" object is tainted because a tainted object "inner" contributes to the creation of "link". But the object {hello: link} is not tainted since it is not created based on "link". {hello: link} object is the parent object of "link".

If you try to change the sink function call "eval({hello: link})" to "eval({hello: link}.hello)", it should work. We tried to propagate the tainted marker from the child object to its parent object to solve a similar problem, which causes the over-tainting problem and increases the false positive rate.

I can try to implement a version to support your second case. If you are interested, you can also check out the "src/core/checker.py" file and modify the "traceback" function to try to support it.

Thank you!

Answer 2 · 2022-01-12T19:39:08.000Z

Hi,

Thanks very much for your explanations. I see that adding the taint propagation from a parent to a child could cause over-tainting issues. But currently, this propagation feature is really needed for me to see all possible data flows.. So I would really appreciate it if you could kindly make a version that does this (or even just telling how to change the code). Thanks a lot.

Answer 3 · 2022-01-17T10:15:55.000Z

Regarding this:

ODGen uses object-level taint marking so in your second case, the "link" object is tainted because a tainted object "inner" contributes to the creation of "link". But the object {hello: link} is not tainted since it is not created based on "link". {hello: link} object is the parent object of "link".

I realized that ODGen does not track data flow from val to eval() in the following case:

  var val = 1;
  var a = {hello: val};
  eval(a);

However, ODGen does follow the data flow from val to eval() in the following case:

  var val = 1;
  var a = {hello: val + 2};
  eval(a);

So, for a consistency reason, I think it's more natural for ODGen to detect the data flow from val to eval() in the first case as well.

Answer 4 · 2022-01-17T18:40:50.000Z

Hi.
By default, ODGen uses AST-based traceback for taint-based vulnerabilities. For example, in your case, if you use var a = {hello: val}, we do not have a dataflow edge to this line since no object is created based on the existing tainted objects. At the same time, if you use var a = {hello:val + 2};, a new object val + 2 is created based on a tainted object val thus ODGen will generate a dataflow to line 2.

With that being said, the best way to solve this problem is to build dataflow edges for all of the child objects of the used objects. For example, we have to go through all the child objects of the {hello: val} object and build the related dataflow edges accordingly.

To do so, you can go to src/plugins/internal/utils.py and take a look at the build_df_by_use function. At line 468 of the file, you can see there is a for loop. You can use G.add_edge to add OBJ_REACHES edges for all the child objects of the used object.

Please feel free to modify the code and push it to the repo. Thank you!

Answer 5 · 2022-01-17T19:24:45.000Z

Thanks for your great feedback.

I tried that, but it seems that the build_df_by_def_use function is not called for handling the {hello:val} array code. So I instead changed 3 lines in src/plugins/internal/handlers/array.py (see ** marks):

class HandleArrayElem(Handler):
    """
    the array element handler
    """
    def process(self):
        if not (self.extra and self.extra.parent_obj is not None):     
            loggers.main_logger.error("AST_ARRAY_ELEM occurs outside AST_ARRAY")
            return None 
        else: 
            # should only have two childern
            try:
                value_node, key_node = self.G.get_ordered_ast_child_nodes(self.node_id)
            except: 
                # TODO: Check what happend here for colorider
                return NodeHandleResult()
                
            key = self.G.get_name_from_child(key_node)
            if key is not None:
                key = key.strip("'\"")
            else:     
                # shouldn't convert it to int
                key = self.G.get_node_attr(self.node_id).get('childnum:int')
            if key is None:
                key = wildcard
            handled_value = self.internal_manager.dispatch_node(value_node, self.extra)
            value_objs = to_obj_nodes(self.G, handled_value, self.node_id)
            **used_objs = list(set(handled_value.used_objs))**
            **used_objs.extend(handled_value.obj_nodes)**
            for obj in value_objs:
                self.G.add_obj_as_prop(key, self.node_id,
                    parent_obj=self.extra.parent_obj, tobe_added_obj=obj)
        return NodeHandleResult(obj_nodes=value_objs, **used_objs=used_objs**,
            callback=get_df_callback(self.G))

I hope this is a correct solution.

Answer 6 · 2022-01-18T23:59:36.000Z

Hi, you can test to see if it works or not. As far as I know, it looks good and can solve the problem -- temperately. I do not know what will happen if we use something like var a = {hello: {hello: 123}};. I am also a little bit worried about the overhead of this modification.

You can submit a pull request and I will merge it if you want. Thank you!

Answer 7 · 2022-01-20T18:28:15.000Z

Thanks, I created a pull request.

Answer 8 · 2022-01-20T18:36:10.000Z

I have one additional question (which is not strictly object-related). ODGen does not detect any os_command vulnerability for the following example code:

function empty() { }
  
empty(function() {
   function b(filePath) {
      eval(filePath);
   };

   exports.a = function a(path) {
      b(path);
   };
});

However, ODGen detects an os_command vulnerability in the following example:

function empty() { }
  
hello(function() {
   function b(filePath) {
      eval(filePath);
   };

   exports.a = function a(path) {
      b(path);
   };
});

|Checker| success: [['59', '50']] color: green
Attack Path: 
==========================
$FilePath$/home/skyer/Desktop/ODGen-master/test.js
Line 8		exports.a = function a(path) {
		b(path);
	};
$FilePath$/home/skyer/Desktop/ODGen-master/test.js
Line 5			eval(filePath);
 
os_command detected at ../ODGen-master/test.js

Is this because if the hello function is not defined, ODGen assumes its argument (i.e., anonymous function) to be executed and thus exports.a = ... is assumed to be executed?