pascal-lab/Tai-e

Replacing / manipulating IR of methods in "world"

staslath opened this issue · 6 comments

Does tai-e support manipulating/replacing the IR of a method that has already been included in the World?

Motivating Use Case

Consider the Service Loader java API, suppose we have a service called SearchEngine, with 2 implementations:

  • YahooSearchEngine
  • BingSearchEngine

Client code could say:

var results =
  ServiceLoader.load(SearchEngine.class).stream()
    .map(ServiceLoader.Provider::get)
    .map(searchEngine -> searchEngine.search("cat"))
    .toList();

and Java's ServiceLoader infrastructure would then go and search the META-INF/services/ directories in the classpath jars to find implementors.

For example, one jar could have a file META-INF/services/mypkg.SearchEngine with the following text line:

mypkg.YahooSearchEngine

and another:

mypkg.BingSearchEngine

Since implementing classes are read from files, tai-e won't know about them, which compromises call graph construction for example.

I'd love to have your perspective on what would be the tai-e idiomatic way to address this issue.

A naive approach would seek to "rewrite":

ServiceLoader.load(SearchEngine.class).stream()

to something like the following (since we know mypkg.YahooSearchEngine/BingSearchEngine are the service providers):

Stream.of(
   mypkg.YahooSearchEngine.class.newInstance(), 
   mypkg.BingSearchEngine.class.newInstance()
)

while this naive approach is very naive (e.g, it even ignores the fact ServiceLoader#stream returns Stream<Provider<S>> rather than Stream<S> where S is the type of the implementing class), hopefully it gets the idea across.

My initial experiments to have a plugin that does that have failed :)

  1. When I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche, they do not result in YahooSearchEngine#search or BingSearchEngine#search being added to the call graph.
  2. As a sanity check, if I write the equivalent java code directly (as opposed to IR manipulations) and have it analysed by tai-e, it's capable of "following through the stream.of", and the call graph includes both YahooSearchEngine#search and BingSearchEngine#search.

Looking forward to your reply!

Firstly, Tai-e can indeed simulate the behavior of certain APIs by synthesizing IR in pointer analysis. An example is ArrayModel, which we use to model System.arraycopy().

However, for your particular case, a more idiomatic and straightforward approach should be to handle reflection properly. I'm not familiar with ServiceLoader, but it seems to create instances (such as your various *SearchEngine instances) through reflection. So, the simplest way would be to make Tai-e capable of analyzing the reflection calls within ServiceLoader.

There are two ways to address this issue:

  1. Use Tai-e's solar (option: pta=...;reflection-inference:solar;...), which is a more powerful reflection analysis compared to the default option and may directly resolve instances created by ServiceLoader.
  2. You can explicitly inform Tai-e through a reflection log. To do this, you need to locate the actual reflection calls in the JDK for ServiceLoader (likely newInstance() calls), and then use a log to inform Tai-e (option: pta=...;reflection-log:refl.log;...). The format of reflection log is the same as TamiFlex. For reference, you can take a look at how refl.log is written in java-benchmarks.
  1. When I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche, they do not result in YahooSearchEngine#search or BingSearchEngine#search being added to the call graph.

How did you do that (I use solver.addStmts(...) to add a set of generated statements mimicking the above appraoche)?

Compared with the generic solution mentioned by @silverbullettt, this requires very careful modeling; you might also consider whether the modeling is wrong.

@silverbullettt , @zhangt2333 thanks for taking the time to respond!

After further debugging I came to learn that my using solver.addIgnoredMethod(...) along with solver.addStmts(method, ...) was likely the root cause for things not working as expected. In particular, I believed that addIgnoredMethod() would ignore the original IR of that method, and addStmts() would replace that IR with my own. However, the use of addIgnoredMethod(method) resulted in other plugins (e.g., lambda analysis) not properly handling the method at hand.

While debugging and trying out different things, I was wondering if there was a good way to access an enclosing class' members inside an InvokeHandler of its non-static inner class?

if there was a good way to access an enclosing class' members inside an InvokeHandler of its non-static inner class?

I'm afraid I haven't understood you well, could you provide an example?

@zhangt2333 , consider the following example:

public class Main {

  private String fileName ...; // populated from args
   
  private Stream<String> readFile() { ... } // uses the fileName member of the enclosing class

  private class AnimalFinder {
    public List<Animal> findAnimals() {
      return readFile().map(Class::forName).toList();
    }
  }

  public Main(String[] args) { ... }

  public static void main(String[] args) {
    Main main = new Main(args); // populates fileName, among other things
    List<Animal> animals = main.new AnimalFinder().findAnimals();
  }
}

I'd like to replace the IR of readFile(), such that I provide the file content myself, based on the paths I observe in fileName's points-to set.

However, if I create a model which derives from AbstractModel and has InvokeHandler for readFile(), I won't have the points-to set for fileName because it's not part of readFile()'s signature, and not even part of the class containing readFile(), only it's enclosing class - Main.

Is there a good way to access fileName's points-to set in such case?

In the method readFile, you can obtain the InstanceField (Pointer) fileName by the thisObj (recvObj) of readFile and the JField fileName. Then you can retrieve the PTS of filenName.