secure-software-engineering/phasar

Problem in initialization Seeds

Luweicai opened this issue · 1 comments

  • [√] I have searched open and closed issues for duplicates
  • [√] I made sure that I am not using an old project version (DO: pull Phasar, update git submodules, rebuild the project and check if the bug is still there)

// The %0 is the taint seed.
define  void @foo(i32 %0){
 call void @llvm.dbg.value(metadata i32 %0, metadata !21, metadata !DIExpression()), !dbg !22;
%1 = add nsw i32 %0, 1;
%2 = add nsw i32 %1, 1;
call void @tt(i32 %3);
%5 = add nsw i32 %4, 1;
}

The taint fact value has some mistakes in the no call instruction:

N: %1 = add nsw i32 %0, 1;
----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: BOTTOM

N: %2 = add nsw i32 %1, 1;
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: TOP
%1 = add nsw i32 %0, 1; | V: TOP (should be BOTTOM)

N: call void @tt(i32 %3);
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: BOTTOM
%1 = add nsw i32 %0, 1; | V: BOTTOM

N: %5 = add nsw i32 %4, 1;
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: TOP
D: %1 = add nsw i32 %0, 1; | V: TOP (should be BOTTOM)
D: %2 = add nsw i32 %1, 1; | V: TOP (should be BOTTOM)

If a taint seed is a argument of one function, it will be attached to the entry instructon of that function. The following is the code of taint seeds initialization.

std::map<const llvm::Instruction *, std::set<const llvm::Value *>>
LLVMTaintConfig::makeInitialSeedsImpl() const {
  std::map<const llvm::Instruction *, std::set<const llvm::Value *>>
      InitialSeeds;
  for (const auto *SourceValue : SourceValues) {
    if (const auto *Inst = llvm::dyn_cast<llvm::Instruction>(SourceValue)) {
      InitialSeeds[Inst].insert(Inst);
    } else if (const auto *Arg = llvm::dyn_cast<llvm::Argument>(SourceValue);
               Arg && !Arg->getParent()->isDeclaration()) {
      const auto *FunFirstInst = &Arg->getParent()->getEntryBlock().front();
      InitialSeeds[FunFirstInst].insert(Arg);
    }
  }
  return InitialSeeds;
}

However, when the exploed spuer graph is construted and comes to the DFA Phase II, in the valueComputationTask,

        void valueComputationTask(const std::vector<n_t> &Values) {
            PAMM_GET_INSTANCE;
            for (n_t n : Values) {
                for (n_t SP : ICF->getStartPointsOf(ICF->getFunctionOf(n))) {
                    using TableCell = typename Table<d_t, d_t, EdgeFunctionPtrType>::Cell;
                    Table<d_t, d_t, EdgeFunctionPtrType> &LookupByTarget =
                            JumpFn->lookupByTarget(n);
                    for (const TableCell &SourceValTargetValAndFunction :
                            LookupByTarget.cellSet()) {
                        d_t dPrime = SourceValTargetValAndFunction.getRowKey();
                        d_t d = SourceValTargetValAndFunction.getColumnKey();
                        EdgeFunctionPtrType fPrime = SourceValTargetValAndFunction.getValue();
                        l_t TargetVal = val(SP, dPrime);
                        PHASAR_LOG_LEVEL(DEBUG,"SP " << IDEProblem.NtoString(SP)<<" dprime: " <<IDEProblem.DtoString(dPrime) <<"  n: " << IDEProblem.NtoString(n) << fPrime->str() <<"  Target val: " << IDEProblem.LtoString(TargetVal));
                        setVal(n, d,
                               IDEProblem.join(val(n, d),
                                               fPrime->computeTarget(std::move(TargetVal))));
                        INC_COUNTER("Value Computation", 1, PAMM_SEVERITY_LEVEL::Full);
                    }
                }
            }
        }

The lmplementation of getStartPointsOf in for (n_t SP : ICF->getStartPointsOf(ICF->getFunctionOf(n))) is:

std::set<const llvm::Instruction *>
LLVMBasedCFG::getStartPointsOf(const llvm::Function *Fun) const {
  if (!Fun) {
    return {};
  }
  if (!Fun->isDeclaration()) {
    const auto *EntryInst = &Fun->front().front();
    if (IgnoreDbgInstructions && llvm::isa<llvm::DbgInfoIntrinsic>(EntryInst)) {
      return {EntryInst->getNextNonDebugInstruction(
          false /*Only debug instructions*/)};
    }
    return {EntryInst};
  }
  PHASAR_LOG_LEVEL(DEBUG, "Could not get starting points of '"
                              << Fun->getName()
                              << "' because it is a declaration");
  return {};
}

This funciton will return the first no debug entry instruciton.

Which mean, when a taint seed is the parament of a function and the entry instructon of that function is a debug instruction, the result table will record it as bottom however the valueComputationTask will calculate from the first no debug instruction. Will casue the problem illustrate in the beginning.

Hi @Luweicai, thank you for reporting this issue in such a detail. You are right: This is indeed a bug.
#635 should fix this