facebook/mariana-trench

Query on propagating inputstream to buffer

Opened this issue · 2 comments

Hello, I'm trying to write a rule to detect a flow from an external directory source to an outputstream.

String filePath = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS).toString() + "/" + userInputFile;
File f = new File(filePath);
InputStream is = new FileInputStream(f);
byte[] buffer = new byte[size];
while((n = is.read(buffer)) > 0) {
     os.write(buffer, 0, n);
}

How do I propagate the taint from the inputstream to the buffer, such that when the outputstream uses the buffer, the ouputstream itself become tainted. I tried propagation and it didn't work. Is it possible to do such propagation?

Propagation:

{
            "find": "methods",
            "where": [
                {
                    "constraint": "parent",
                    "inner": {
                        "constraint": "extends",
                        "inner": {
                            "constraint": "name",
                            "pattern": "Ljava/io/InputStream"
                        }
                    }
                },
                {
                    "constraint": "name",
                    "pattern": "read.*"
                }
            ],
            "model": {
                "propagation": [
                    {
                        "input": "Argument(0)",
                        "output": "Argument(1)"
                    }
                ]
            }
}

Source:

{
  "model_generators": [
    {
      "find": "methods",
      "where": [
        {
          "constraint": "name",
          "pattern": "getExternal.*"
        }
      ],
      "model": {
        "generations": [{
          "kind": "ExternalSource",
          "port": "Return"
        }]
      }
    }
  ]
}

Sink:

{
    "model_generators": [
        {
            "find": "methods",
            "where": [
                {
                    "constraint": "parent",
                    "inner": {
                        "constraint": "extends",
                        "inner": {
                            "constraint": "name",
                            "pattern": "Ljava/io/OutputStream"
                        }
                    }
                },
                {
                    "constraint": "name",
                    "pattern": "write"
                }
            ],
            "model": {
                "for_all_parameters": [
                    {
                        "variable": "x",
                        "where": [
                            {
                                "constraint": "name",
                                "pattern": "\\[B"
                            }
                        ],
                        "sinks": [
                            {
                                "kind": "OutputWriteSink",
                                "port": "Argument(x)"
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

Edit: included the model for my source and sink

I also have another question regarding multi-source and partial sink as I see some of the default source/sink has it but it is not documented. I would like to enquire about the use case.

Hey, not one of the devs, but I've been working on understanding Mariana Trench as well, and did some poking at your issue to improve my knowledge.

The key thing I noticed was that the class names in your propagation and sink models are missing trailing semicolons. "Ljava/io/InputStream" and "Ljava/io/OutputStream" should be "Ljava/io/InputStream;" and "Ljava/io/OutputStream;". I spotted this by adding "verbosity": 1 to the models so Mariana Trench would log what methods were found for each model, and noticed that nothing was turning up for these model generators. (Feature request for MT devs - could you automatically log a warning to the console for any model generator that doesn't find any matches? That'd make it a lot easier to spot these problems.)

This requirement isn't super well documented (there is a note in the documentation name: Expects an extra property pattern which is a regex to fully match the name of the item; [so the trailing semicolon is needed for the full match] - but the same documentation also gives the conflicting example of a non-full match

"constraint": "parent",
            "inner": {
              "constraint": "extends",
              "inner": {
                "constraint": "name", "pattern": "SandcastleCommand"
              }
            }

which just didn't seem to work in my testing. MT devs - is this example wrong in the documentation, or is it intended to work and there's just a bug?)

Once I fixed the semicolons, and added a simple rule

{
    "name": "Issue78",
    "code": 78,
    "description": "test",
    "sources": [
      "ExternalSource"
    ],
    "sinks": [
      "OutputWriteSink"
    ]
  }

the taint propagated correctly and the issue showed up. Apart from the missing semicolon, your propagation rule seems to work fine (with the slight issue that the "read.*" method constraint picks up some extra read methods that don't use a buffer at argument 1, like readChar() or readLong() - but this doesn't affect the issue you're trying to pick up).

I still don't understand the multi-source/partial sink stuff either, so hoping someone from the MT team can provide more.

@justfoxing ahh I see. Looks like I messed it up. Thank you for identifying and notifying me.

I guess I will try using the for_all_parameter and the where clauses to detect for any argument that is of a byte[] type to narrow the possibility. Hopefully it will works for propagation model too.