MagpieBridge/IRConverter

Better Wala error message

martinschaef opened this issue · 25 comments

I'm running some tests from command line. When I create a WalaToSootIRConverter for a source path that only contains simple files (i.e., they don't import non-jdk stuff) everything works fine. But when I add something more complicated and provide the classpath, I always end up with this exception:

[main] ERROR dragonglass.lspintegration.Main - Soot threw an unexpected exception.
java.lang.RuntimeException: java.lang.RuntimeException: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at magpiebridge.converter.WalaToSootIRConverter.iterateWalaJavaSourceClasses(WalaToSootIRConverter.java:183)
        at magpiebridge.converter.WalaToSootIRConverter.convert(WalaToSootIRConverter.java:191)
        at 
...
Caused by: java.lang.RuntimeException: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at magpiebridge.converter.WalaToSootIRConverter.buildClassHierachy(WalaToSootIRConverter.java:174)
        at magpiebridge.converter.WalaToSootIRConverter.iterateWalaJavaSourceClasses(WalaToSootIRConverter.java:181)
        ... 4 more
Caused by: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:291)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:203)
        at com.ibm.wala.ipa.cha.ClassHierarchyFactory.make(ClassHierarchyFactory.java:81)
        at com.ibm.wala.ipa.cha.ClassHierarchyFactory.make(ClassHierarchyFactory.java:67)
        at magpiebridge.converter.WalaToSootIRConverter.buildClassHierachy(WalaToSootIRConverter.java:171)
        ... 5 more
Caused by: java.lang.NullPointerException
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.createClassDeclaration(JDTJava2CAstTranslator.java:509)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.visitTypeDecl(JDTJava2CAstTranslator.java:434)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.visit(JDTJava2CAstTranslator.java:3731)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.translateToCAst(JDTJava2CAstTranslator.java:273)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJSourceModuleTranslator$ECJAstToIR.acceptAST(ECJSourceModuleTranslator.java:95)
        at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:1068)
        at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:662)
        at org.eclipse.jdt.core.dom.ASTParser.createASTs(ASTParser.java:1013)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJSourceModuleTranslator.loadAllSources(ECJSourceModuleTranslator.java:195)
        at com.ibm.wala.cast.java.loader.JavaSourceLoaderImpl.loadAllSources(JavaSourceLoaderImpl.java:599)
        at com.ibm.wala.classLoader.ClassLoaderImpl.init(ClassLoaderImpl.java:506)
        at com.ibm.wala.cast.java.loader.JavaSourceLoaderImpl.init(JavaSourceLoaderImpl.java:608)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJClassLoaderFactory.makeNewClassLoader(ECJClassLoaderFactory.java:31)
        at com.ibm.wala.classLoader.ClassLoaderFactoryImpl.getLoader(ClassLoaderFactoryImpl.java:61)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:270)
        ... 9 more

Is there a way to get a better error message that would tell me what it was looking for?

@martinschaef Hi Martin, thanks for trying it out. Can you provide me the program which caused the problem? I can try to fix it if I can reproduce the error. The WalaToSootIRConverter.java class is where WALA is used, which contructor did you use to initialize it? for your case the class path to libraries should be passed to libPath in this constructor. public WalaToSootIRConverter(@Nonnull Set<String> sourcePath, @Nonnull Set<String> libPath) I am also ready to chat with you about this issue in Skype. You can find me in Skype with my e-mail address which you have.

Thanks for the quick reply :) I can't give you the exact code, but I'll try to produce a smaller example that I can share here. It looks like this is actually a Wala problem (see here).

Btw., is it, at least in theory, possible to tell Wala to create phantom classes if it cannot resolve something? Basically, an equivalent to –allow-phantom-refs in Soot.
I have one use case where it would be very expensive for me to get the full classpath and it would be great if I could just tell Wala to use phantoms to make it work.

@martinschaef It will be great if you can produce a smaller example.
There is a way tell to wala to create phantom classes:
When you look at the following method in WalaToSootIRConverter.java

 /** Use WALA's JAVA source code front-end to build class hierarchy. */
  private void buildClassHierachy() {
    try {
      this.classHierarchy = ClassHierarchyFactory.make(scope, factory);
      Warnings.clear();
    } catch (ClassHierarchyException e) {
      throw new RuntimeException(e);
    }
  }

instead of using the make method, try

ClassHierarchyFactory.makeWithPhantom(scope, factory);

should do the work I believe. I can try to add this support tomorrow. Now it is too late in Germany.
Also if you wanna exclude classes from your analysis you could just define them in an exclusion file
and pass it to this constructor
public WalaToSootIRConverter( @Nonnull Set<String> sourcePath, @Nonnull Set<String> libPath, String exclusionFilePath)

I would recommend ClassHierarchy.makeWithRoot() over makeWithPhantom(). There are known bugs involving makeWithPhantom(), see wala/WALA#377 and wala/WALA#335. I will update the Javadoc to reflect this.

Excellent, thx. I tried to subtype WalaToSootIRConverter, but magpiebridge.converter.ClassConverter#convertClass is protected, so I can't call it from my subclass. I'll play with it tomorrow.

@martinschaef Hi Martin, after playing with an example I constructed, I believe the error message you got means that you didn't give the IRConverter library classes which are directly referenced by the source code.
Consider the following source code:

class ApplicationClass
{
 void foo(){
    LibraryClass  c = new LibraryClass();
 }
}

This LibraryClass must be in the library Jar file in the library path you gave to the converter.
The WALA source front end uses the front end of Eclipse JDT, when it sees LibraryClass in the source code, it will try to resolve the type LibraryClass from the source and library path. If it can not find it, then it will bring up such error message.

So to fix this issue, you should make sure all library classes which are directly referenced by the source code can be found in the library path.

There is currently no configuration option in WALA to ignore such missing types in the source code front end.

The exclusionFilePath option is not relevant here, since WALA will only ignore these classes when analysis in WALA is used.

After talking with Julian, currently there is no better warning message to tell which class is missed.
To figure out which library classes must be included, you can open the source code project in Eclipse, Eclipse will give you warnings about what classes it can not resolve. These classes are then required to be on the library classes.

I just released a new version of the IRConverter which allows you have access to ClassConverter https://github.com/MagpieBridge/IRConverter/packages/96202

Excellent, thx. I'll try to work with that. Like I said, I have one use case where it's hard to get any library class because we can't really run the build script to get the compile/runtime classpath.

Is it theoretically possible to extend Java to create a phantom class in these cases, or is there a conceptual problem that makes this hard/impossible? I'm aware that we'd loose soundness because of missing exceptional flow, but that's the same if we use phantoms in bytecode.

From my understanding the issue is that the Java source front end is now based on the Eclipse Java Compiler (ECJ), and ECJ APIs start doing unexpected things when referenced classes are missing (like returning null unexpectedly). WALA has not been engineered to try to be robust to all of these behaviors. I think it's theoretically possible to be more robust here, but it would be difficult (many of the unexpected behaviors are undocumented). For Java bytecode things are easier since WALA controls all the code for reading bytecode, resolving types and supertypes, etc.

Manu is right. The key is that, unlike for bytecode, unknown code can influence the IR that the source front needs to generate.

For instance, imagine seeing a call like foo() in a Java source file, and not having any obvious target. Is it, perhaps, in a unknown super class, or interface with a default implementation? Or perhaps it would be found in a static import of an unknown class? Each of these cases would result in different invoke IR.

This could, potentially, be handled in some cases. The front end could look at the file and see what possibilities there are. Maybe the type hierarchy is complete, which would rule out super calls, and there is exactly one static import; in that case, we could generate a static invoke to foo in whatever class is imported.

But that seems like a rat hole of special cases that anyway would not be able to work in general.

(Note that, in bytecode analysis, the correct bytecode must have been generated by someone already, and they must have been able to compile the code.)

Yeah, I was more thinking along the lines of snippet parsing. I remember that ASTParser in JDT has an option to set statement recovery and bindings recovery where it just puts some stuff in the AST to make it well-formed again.
It's definitely not in scope for this issue, but I was wondering if the Wala folks ever thought about that. I remember this Darpa Muse project where folks did a lot of snippet parsing.

Did you ever look into snippet parsing?

There is a long thread on this kind of issue here:

wala/WALA#99

I think someone has tried something like snippet parsing before and it didn't quite work:

wala/WALA#99 (comment)

That said, with newer versions of Eclipe JDT / ECJ maybe the snippet parsing would work better.

I got a little bit further with a custom WalaToSootIRConverter (see https://gist.github.com/martinschaef/56e070cd7a9c5ecd4a2957d72fd2b92f), but I still get exceptions for anything that's a little more complex. E.g.,

[main] INFO dragonglass.lspintegration.CliCommand - Found 1 classpath entries
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at magpiebridge.converter.InstructionConverter.convertInvokeInstruction(InstructionConverter.java:752)
        at magpiebridge.converter.InstructionConverter.convertInstruction(InstructionConverter.java:157)
        at magpiebridge.converter.ClassConverter.createBody(ClassConverter.java:261)
        at magpiebridge.converter.ClassConverter.convertMethod(ClassConverter.java:194)
        at magpiebridge.converter.ClassConverter.convertClass(ClassConverter.java:129)
        at dragonglass.lspintegration.analysis.CustomWalaConverter.convert(CustomWalaConverter.java:50)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:35)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:72)
        at dragonglass.lspintegration.Main.main(Main.java:57)
[main] ERROR dragonglass.lspintegration.analysis.CustomWalaConverter - Index: 1, Size: 1
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at magpiebridge.converter.InstructionConverter.convertInvokeInstruction(InstructionConverter.java:752)
        at magpiebridge.converter.InstructionConverter.convertInstruction(InstructionConverter.java:157)
        at magpiebridge.converter.ClassConverter.createBody(ClassConverter.java:261)
        at magpiebridge.converter.ClassConverter.convertMethod(ClassConverter.java:194)
        at magpiebridge.converter.ClassConverter.convertClass(ClassConverter.java:129)
        at dragonglass.lspintegration.analysis.CustomWalaConverter.convert(CustomWalaConverter.java:50)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:35)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:72)
        at dragonglass.lspintegration.Main.main(Main.java:57)

And, if I remove a little bit of code, I get an exception from Soot:

[main] ERROR dragonglass.lspintegration.Main - Soot threw an unexpected exception.
java.lang.RuntimeException
        at soot.jimple.toolkits.callgraph.CallGraph$TargetsOfUnitIterator.<init>(CallGraph.java:213)
        at soot.jimple.toolkits.callgraph.CallGraph.edgesOutOf(CallGraph.java:203)
        at soot.jimple.toolkits.callgraph.ClinitElimAnalysis.flowThrough(ClinitElimAnalysis.java:72)
        at soot.toolkits.scalar.FlowAnalysis.flowThrough(FlowAnalysis.java:592)
        at soot.toolkits.scalar.FlowAnalysis.doAnalysis(FlowAnalysis.java:556)
        at soot.toolkits.scalar.ForwardFlowAnalysis.doAnalysis(ForwardFlowAnalysis.java:46)
        at soot.jimple.toolkits.callgraph.ClinitElimAnalysis.<init>(ClinitElimAnalysis.java:42)
        at soot.jimple.toolkits.callgraph.ClinitElimTransformer.internalTransform(ClinitElimTransformer.java:38)
        at soot.BodyTransformer.transform(BodyTransformer.java:55)
        at soot.BodyTransformer.transform(BodyTransformer.java:59)
        at soot.jimple.toolkits.callgraph.CallGraphPack.internalApply(CallGraphPack.java:65)
        at soot.Pack.apply(Pack.java:117)
        at dragonglass.lspintegration.util.SootSetup.runAnalysis(SootSetup.java:76)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:49)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:73)
        at dragonglass.lspintegration.Main.main(Main.java:57)

Does either of these look familiar?

@martinschaef The first one exception could be an error in the InstructionConverter caused by an invoke statement. Can you paste the code you removed, especially the invoke statements?
The second exception is thrown when a unit in the call edge is null.

Unfortunately, I can't really past any of that code. I'll find some time later in the week to work on examples that can be shared.

In the meantime, did you try it on projects that use fancy things, like Lombok, Guava, or AspectJ? Are there any known limitations that these things don't work?

@martinschaef For the exception you pasted I do need code that I can use to debug the issue. I didn't use the IRConverter on incomplete code.
I tried it on some web applications using Spring framework. As far as I know, WALA's source code front end supports only up to Java 7 features. So if you have source code written with some Java 8 features, it may crash.

Ugh ... that is a big limitation. Our coding guideline says to use Streams everywhere. @msridhar @juliandolby is there a roadmap for java 8+? If not, we might have to change plans a little

I personally do not have any plans to work on the Java source front end at the moment. I know @mattkindy-praetorian may have poked around at support for lambdas, but not sure if he can share more

@martinschaef To solve this problem, have you thought of using desugaring of Java 8 language features? I know Google does this for Android (D8). Currently, our group is building a new soot and we also uses WALA's source code front end, so supporting Java 8+ features is planned.

I know Google does this for Android (D8).

D8 desugars bytecodes, not source code. If analysis of bytecodes is in scope, then WALA's bytecode frontend should work fine with Java 8 features.

@msridhar True, but I meant the idea. I can image an automated tool which rewrites the source code before passing it to WALA's source code front end.

Yes, I think Java 8 is actually the bigger problem than snippets. If we have full Java 8 support, we can still cover a lot of use cases (including using MagpieBridge). Snippet parsing is a nice-to-have, but we can work around it.
Should we have a phone call to discuss the options?

Unsure about whether I can contribute code, but I was able to get Java 8 support in place by introducing a LambdaEntity(extends JavaProcedureEntity) as part of the translation process (similar for MethodReferenceEntity). These entities handled a lot of the desugaring themslves. For example:

    /**
     * Desugar: create a method on the class with name `lambda${enclosingTypeName}${lambdaIndex}`
     *  Should have same parameters as declared for lambda unless we are encapsulating values from
     *  outside lambda scope, in which case those will be prepended
     */
    public MethodEntity getDesugaredMethod() {
      final MethodDeclaration methodDeclaration = ast.newMethodDeclaration();
      final ASTNode body = fDecl.getBody();

      if (body instanceof Block) {
        methodDeclaration.setBody((Block) ASTNode.copySubtree(ast, body));

      } else if (body instanceof Expression) {
        final Block block = ast.newBlock();

        final ReturnStatement returnStatement = ast.newReturnStatement();
        final Expression clonedBody = (Expression) ASTNode.copySubtree(ast, body);

        returnStatement.setExpression(clonedBody);

        ((List<Statement>) block.statements()).add(returnStatement);

        methodDeclaration.setBody(block);
      }

      if (!isNull(fReturnType)) {
        methodDeclaration.setReturnType2(fReturnType.isNullType()
          ? parseTypeFromString("Object", ast, null) // todo we should do better about figuring out the return type
          : createType(fReturnType, ast, null));
      }

      final List<CAstNode> syntheticOuterLocalsNodes = getSyntheticOuterLocalsNodes();
      final List<SingleVariableDeclaration> capturedVariables = syntheticOuterLocalsNodes
        .stream()
        .map(node -> {
          final SingleVariableDeclaration paramDeclaration = ast.newSingleVariableDeclaration();

          final CAstType cAstType = (CAstType) node.getChild(1).getValue();
          final String paramTypeName = constructTypeName(cAstType);

          final Type type = parseTypeFromString(paramTypeName, ast, null);

          paramDeclaration.setType(type);

          final String name = ((String) node.getChild(0).getValue())
            .replace("val$", "");

          paramDeclaration.setName(ast.newSimpleName(name));

          return paramDeclaration;
        })
        .collect(toList());

      final List<SingleVariableDeclaration> formalLambdaParameters = ((List<VariableDeclaration>) fDecl.parameters())
        .stream()
        .map(param -> {
          if (param instanceof SingleVariableDeclaration) {
            return (SingleVariableDeclaration) ASTNode.copySubtree(ast, param);
          }

          final SingleVariableDeclaration paramDeclaration = ast.newSingleVariableDeclaration();

          final ITypeBinding resolvedType = resolveVariableBinding(param).getType();
          final String paramTypeName = resolvedType.isCapture()
            ? resolvedType.getWildcard().getName()
            : resolvedType.getName();

          final Type type = parseTypeFromString(paramTypeName, ast, null);

          paramDeclaration.setType(type);
          paramDeclaration.setName((SimpleName) ASTNode.copySubtree(ast, param.getName()));

          return paramDeclaration;
        })
        .collect(toList());

      ((List<SingleVariableDeclaration>) methodDeclaration.parameters())
        .addAll(capturedVariables);

      ((List<SingleVariableDeclaration>) methodDeclaration.parameters())
        .addAll(formalLambdaParameters);

      if (!isBound()) {
        ((List<IExtendedModifier>) methodDeclaration.modifiers())
          .add(ast.newModifier(STATIC_KEYWORD));

      } else {
        methodDeclaration.setReceiverType(ast.newSimpleType(ast.newName(getEnclosingClassName())));
      }

      methodDeclaration.setName(ast.newSimpleName(getDesugaredMethodName()));
      final Map<CAstNode, CAstEntity> unravelledEntities = fEntities
        .entrySet()
        .stream()
        .collect(toMap(Map.Entry::getKey, entry -> entry.getValue().stream().collect(onlyElement())));

      final ArrayList<CAstType> parameterTypes = new ArrayList<>();

      parameterTypes
        .addAll(syntheticOuterLocalsNodes
          .stream()
          .map(node -> (CAstType) node.getChild(1).getValue())
          .collect(toImmutableList()));

      parameterTypes.addAll(fParameterTypes);

      return new MethodEntity(
        fAst,
        methodDeclaration,
        getEnclosingTypeBinding(),
        unravelledEntities,
        fNewContext,
        parameterTypes,
        fReturnType,
        annotations);
    }

Then I had to modify some bootstrap functionality to support AST-based lambdas, so that involved also changing LambdaSummaryClass See wala/WALA#549 for some discussion

It can perhaps be useful to use something like the below to get a partial desugaring as well? I used this for testing at various points (assuming the code compiles)

http://keweishang.blogspot.com/2016/04/how-compiler-understands-your-java-class.html

@mattkindy-praetorian Your code looks very promising. You can certainly contribute code; a pull request complete with a few tests would be great. If you wrote your code yourself, then contributing should be quite easy and would be very much appreciated.