secure-software-engineering/phasar

How to deal with the results related to function call in data flow analysis?

yuffon opened this issue · 10 comments

yuffon commented

Hi,
I am using phasar to conduct type state analysis.
I want to push a finite state automata when meeting special function calls.
In my IFDS problem, I push the automata states in calltoret flow function.
But when I dump the analysis results, I find that the data flow facts are postponed by one instruction after function call.

For example, when I need to push state 0 to state 1 after function "fopen".
But I find that the result on the call instruction to "fopen" are killed except for the zero fact.
The pushed new state emerges after the instruction which follows the call instruction to "fopen".

Here is one piece of the dumped results on three sequential instructions. The middle one is the call instruction to "fopen". The state before the call is state 0. The data flow fact after the call instruction is zero fact. The new state 1 emerges after the store instruction.

N: %1 = getelementptr [2 x i8], [2 x i8]* @.str.1, i64 0, i64 0, !psr.id !7 | ID: 5
----------------------------------------------------------------------------
D: Fact :{state:0 } | V: TOP
D: Λ | V: BOTTOM

%call = call %struct._IO_FILE* @fopen(i8* %0, i8* %1), !psr.id !8 | ID: 6
----------------------------------------------------------------------------
D: Λ | V: BOTTOM

store %struct._IO_FILE* %call, %struct._IO_FILE** @f, align 8, !psr.id !9 | ID: 7
------------------------------------------------------------------------------------
D: Λ | V: BOTTOM
D: Fact :{state:1 } | V: TOP

I don't know whether it is a feature of phasar or it is a bug in my code.
Anyone has meet this problem?

MMory commented

Hi @yuffon,

I understand you are writing your own analysis, is that correct? In that case you are defining your own flow functions and it seems that you are killing the old fact in the normal flow function of the first statement. I am a bit confused about your dataflow domain.

In the solver dump, the set of facts you see per statement denotes the facts that hold before executing the statement.

Are you aware of the already existing typestate analysis in phasar? Maybe it is more efficient to start from there or at least get an inspiration about how it could be done.

IFDS is not the best choice for typestate analysis, as the dataflow domain will be large. You will always need pairs of variable and their associated typestate. Your facts look like they don't track the variable, which is incorrect as these functions (mainly) change state of the variables/pointed-to data passed to it.

Hope to help.
Cheers
Martin

yuffon commented

Hi Martin,
Thanks for your reply.
I eliminate the object information in my question for simplicity.
I did points to analysis before the dataflow analysis.

Hi @yuffon,

I understand you are writing your own analysis, is that correct? In that case you are defining your own flow functions and it seems that you are killing the old fact in the normal flow function of the first statement. I am a bit confused about your dataflow domain.

The dataflow domain is a set of pairs {object, state}.

In the solver dump, the set of facts you see per statement denotes the facts that hold before executing the statement.

I did not kill the fact before the call instruction. I will check my code carefully.
By the way, in the analysis, each call instruction is split into a callsite and a ret site. But in the dumped result, there is only one program point before the call instruction and another program point after the call instruction, right?

Are you aware of the already existing typestate analysis in phasar? Maybe it is more efficient to start from there or at least get an inspiration about how it could be done.

OK, I will check it.

IFDS is not the best choice for typestate analysis, as the dataflow domain will be large. You will always need pairs of variable and their associated typestate. Your facts look like they don't track the variable, which is incorrect as these functions (mainly) change state of the variables/pointed-to data passed to it.

Thanks for your remind.

Hope to help. Cheers Martin

yuffon commented

Martin

Hi Martin,
I need to support LLVM 12 in my project.
So I use an old version of phasar, with branch f-PhasarVaRA12Support.
I don't know whether it is a bug in the old version or not.
What version should I use for LLVM 12?
Thanks.

MMory commented

I believe this was an internally used old branch, I cannot recommend using that one. Instead, please start from development or master and change the LLVM version in the top level CMakeLists.txt. I think it should be possible to fix compile errors then with little effort. Maybe you need to undo a few changes that were required in the transition 12->14.

yuffon commented

Thanks. I will try it.

MMory commented

Hi @yuffon , did you succeed?

yuffon commented

Hi @yuffon , did you succeed?

Hi Mory,
Thanks.
I just tried the latest version of Phasar.
The results are also not correct.
There are several problems as follows.

  1. The backward analysis cannot be constructed due to some methods not implemented in LLVMBasedBackwardCFG/ICFG. I have opened an issue here #657. Basically, I think the old version is OK. I think we need an offical patch.

  2. The data flow facts before the call instructions are correct, but facts before some load/alloca/trunc and getelementptr instructions vanishes. I use the same IFDS problem code as before. In the old version, the data flow facts before these instructions are correct. Currently, I don't know why.
    For example, here are some pieces of my results.

N: %0 = getelementptr [47 x i8], [47 x i8]* @.str, i64 0, i64 0 | ID: -1
------------------------------------------------------------------------
	D: Λ | V: TOP
	D: Fact :{ Obj:40,state:0 } | V: TOP
	D: Fact :{ Obj:0,state:0 } | V: TOP


N: %j = alloca i32, align 4 | ID: -1
------------------------------------
	D: Λ | V: TOP

and

N: %conv = trunc i64 %call1 to i32 | ID: -1
-------------------------------------------
	D: Λ | V: TOP


N: store i32 2, i32* %n, align 4 | ID: -1
-----------------------------------------
	D: Λ | V: TOP
	D: Fact :{ Obj:40,state:0 } | V: TOP
	D: Fact :{ Obj:0,state:0 } | V: TOP

and

N: store i32 0, i32* %retval, align 4 | ID: -1
----------------------------------------------
	D: Λ | V: TOP
	D: Fact :{ Obj:40,state:0 } | V: TOP
	D: Fact :{ Obj:0,state:0 } | V: TOP


N: %5 = load %struct._IO_FILE*, %struct._IO_FILE** %f2, align 8 | ID: -1
------------------------------------------------------------------------
	D: Λ | V: TOP

and

N: %arraydecay = getelementptr inbounds [128 x i8], [128 x i8]* %buf, i64 0, i64 0 | ID: -1
-------------------------------------------------------------------------------------------
	D: Λ | V: TOP


N: %5 = load %struct._IO_FILE*, %struct._IO_FILE** %f2, align 8 | ID: -1
------------------------------------------------------------------------
	D: Fact :{ Obj:40,state:2 } | V: TOP

In my problem, the only flow function related to trunc instruction is getNormalFlowFunction(). This flow function works well for other types of instructions.

  1. One instruction may be printed twice when dumping results. In these two times, data flow facts are different.
    For example:
    The first time:
============ Results for function 'main' ============
N: %call4 = call i64 @fread(i8* %arraydecay3, i64 1, i64 17, %struct._IO_FILE* %5) | ID: -1
-------------------------------------------------------------------------------------------
	D: Fact :{ Obj:0,state:0 } | V: BOTTOM

The second time:

N: %call4 = call i64 @fread(i8* %arraydecay3, i64 1, i64 17, %struct._IO_FILE* %5) | ID: -1
-------------------------------------------------------------------------------------------
	D: Λ | V: BOTTOM
	D: Fact :{ Obj:40,state:2 } | V: BOTTOM
  1. When dumping results, instructions are not ordered as that in the IR file. But old version did. This is a small problem I can handle myself. It seems that the old fashion is better.
MMory commented

Hi @yuffon,

could you please provide your analysis target so that we can try to reproduce your problem? Also, without your flow functions it is hard to judge where things are going wrong.

Issues 4 and 5 look like you are not loading the LLVM module through phasar's IRDB. The sorting of results etc. depends on a value annotation pass of phasar, where instructions are assigned an ID. As you can see above, these IDs are -1 everywhere.

Cheers
Martin

yuffon commented

Hi @yuffon,

could you please provide your analysis target so that we can try to reproduce your problem? Also, without your flow functions it is hard to judge where things are going wrong.

Issues 4 and 5 look like you are not loading the LLVM module through phasar's IRDB. The sorting of results etc. depends on a value annotation pass of phasar, where instructions are assigned an ID. As you can see above, these IDs are -1 everywhere.

Cheers Martin

Thanks @martin.
I have checked every aspects relating to the construction of IFDS problems.
I use the official HelperAnalyses class to construct IRDB and ICFG. I also use createAnalysisProblem to construct a problem. But the results are the same. I also check the construction of IRDB and see that each isntruction is assigned an ID. But when dumping results, the ID is pinrted as -1 everywhere. I don't know whether the reason is I am using LLVM 12.
I will switch to LLVM 14 and try again.

yuffon commented

Hi @MMory, I have tested the new version of phasar with LLVM 14. I also fix one bug in my project. Forward IFDS is OK to me. Thanks.