secure-software-engineering/FlowDroid

An issue that arises when analyzing using custom sourceAndSink.txt

KyleLeith-007 opened this issue · 4 comments

I encountered a minor issue while using FlowDroid to analyze an app. I utilized a custom sourceAndSink.txt file to conduct the analysis, and FlowDroid correctly loaded this file, which contains 37 sources and 39 sinks. However, in some apps, the analysis results included certain sources and sinks that are not present in my custom list. A portion of the logs is as follows:

[Thread-23] INFO soot.jimple.infoflow.cmd.MainClass - Analyzing app D:\fmqPrograms\appDownload\pythonProject\xitonggongju\VMOS.apk (1 of 1)...
...
[Thread-23] INFO soot.jimple.infoflow.android.source.AccessPathBasedSourceSinkManager - Created a SourceSinkManager with 37 sources, 39 sinks, and 414 callback methods.
...
[Thread-23] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - The sink virtualinvoke $r2.<java.io.ByteArrayOutputStream: void write(byte[])>($r4) in method <com.ta.utdid2.device.c: byte[] c()> was called with values from the following sources:
[Thread-23] INFO soot.jimple.infoflow.android.SetupApplication$InPlaceInfoflow - - $r5 = virtualinvoke $r4.<android.telephony.TelephonyManager: java.lang.String getDeviceId()>() in method <com.ta.utdid2.a.a.e: java.lang.String a(android.content.Context)>
...

In the example above, java.io.ByteArrayOutputStream: void write(byte[]) is not included in my custom list, and I do not understand why this result appears.
It would be an immense honor if you could kindly address my question.

The source sink manager also includes all implementors and subclasses of a source/sink signature. I would assume your list contains the <java.io.OutputStream: void write(byte[])> signature.

https://github.com/secure-software-engineering/FlowDroid/blob/develop/soot-infoflow/src/soot/jimple/infoflow/sourcesSinks/manager/DefaultSourceSinkManager.java#L154-L180

The source sink manager also includes all implementors and subclasses of a source/sink signature. I would assume your list contains the <java.io.OutputStream: void write(byte[])> signature.

https://github.com/secure-software-engineering/FlowDroid/blob/develop/soot-infoflow/src/soot/jimple/infoflow/sourcesSinks/manager/DefaultSourceSinkManager.java#L154-L180

Yes, I indeed marked <java.io.OutputStream: void write(byte[])> as a sink. I sincerely appreciate you taking the time to respond to my question. If possible, I would like to ask another small question:

Regarding the efficiency of FlowDroid’s taint analysis, I have optimized the function call graph (CG) within FlowDroid using an algorithm. In brief, this involves removing certain nodes and edges based on specific logic. This optimization has generally resulted in improved analysis speed across most apps during testing. However, for a few apps, the analysis is actually slower when using the optimized CG. Upon analyzing the following two lines from the logs, I observed:

IFDS problem with XXX forward and YYY backward edges solved in TT1 seconds
Path reconstruction took TT2 seconds
When using the reduced CG, some apps show more XXX and YYY, leading to a longer TT1, while others have a longer TT2.

Theoretically, using a CG with fewer nodes and edges should result in faster analysis. Why, then, is this not always the case? Could it be that some important nodes and edges were inadvertently removed during the CG optimization, leading to this outcome? It would be an immense honor if you could spare some time to look into this issue.

Theoretically, using a CG with fewer nodes and edges should result in faster analysis.

Note that IFDS, as implemented in Flowdroid, runs fully on-demand. If the removed call graph edges do not affect the taint paths (or just a few), your improvements might not have any effects or are hidden by variances due to GC pauses, locking in data structures or process scheduling.

If you are talking about the IFDS edges, then yes, the total number of edges should strongly correlate to the runtime. But again, a single edge takes so little time to process, so a small difference might not be measureable.

Why, then, is this not always the case? Could it be that some important nodes and edges were inadvertently removed during the CG optimization, leading to this outcome? It would be an immense honor if you could spare some time to look into this issue.

You might have ran into some bug. During the last year I fixed multiple non-determinims, so make sure you are using the latest commit. Then just rerun the few cases and look whether the number of edges are consistent.

Otherwise, iirc, there is a fallback when no callees are known. In that case, the taints are kept alive.

Theoretically, using a CG with fewer nodes and edges should result in faster analysis.

Note that IFDS, as implemented in Flowdroid, runs fully on-demand. If the removed call graph edges do not affect the taint paths (or just a few), your improvements might not have any effects or are hidden by variances due to GC pauses, locking in data structures or process scheduling.

If you are talking about the IFDS edges, then yes, the total number of edges should strongly correlate to the runtime. But again, a single edge takes so little time to process, so a small difference might not be measureable.

Why, then, is this not always the case? Could it be that some important nodes and edges were inadvertently removed during the CG optimization, leading to this outcome? It would be an immense honor if you could spare some time to look into this issue.

You might have ran into some bug. During the last year I fixed multiple non-determinims, so make sure you are using the latest commit. Then just rerun the few cases and look whether the number of edges are consistent.

Otherwise, iirc, there is a fallback when no callees are known. In that case, the taints are kept alive.

Thank you very much for your response! It is an immense honor to receive your guidance, which has given me a deeper understanding of the principles of FlowDroid. And I will continue to study this issue further.