Language agnostic path finding between source and target function calls.
- For legacy codebases, it is not clear how public-facing API methods are internally used. By traversing multiple methods at a time (vs. one at a time in an IDE), we get a faster understanding of their usage:
- API methods that were deprecated (no calls target them);
- Which controllers handle which API methods;
- How some classes (e.g. utils) cover more API method implementations than others, which may involve higher refactoring costs.
The following steps use the "Weather" sample from the Java Spring framework.
Retrieve the sample:
git submodule update --init --recursive
Consider the following method implementation:
public class WeatherClient extends WebServiceGatewaySupport {
public GetCityForecastByZIPResponse getCityForecastByZip(String zipCode) {
GetCityForecastByZIP request = new GetCityForecastByZIP();
request.setZIP(zipCode);
System.out.println();
System.out.println("Requesting forecast for " + zipCode);
GetCityForecastByZIPResponse response = (GetCityForecastByZIPResponse) getWebServiceTemplate()
.marshalSendAndReceive(request, new SoapActionCallback("http://ws.cdyne.com/WeatherWS/GetCityForecastByZIP"));
return response;
}
// ...
}
We see a call to setZIP()
. Let's find all calls to it under the class WeatherClient
:
./parsers/sum.py examples/data/calls.json <(echo 'weather') <(echo 'setZIP')
Output:
# Calls
### 1
```
WeatherClient:main(String[])
WeatherClient:getCityForecastByZip(String)
setZIP
```
# Total
```
setZIP:1
```
All patterns are case insensitive. Matches are done against classes or methods containing the substrings we provided.
Our script found the longest call path that reached the source method getCityForecastByZip()
, starting at main()
. All indented lines are of target methods reached from function getCityForecastByZip()
. In this case, only one method contains the substring setZIP
.
We can also confirm that the call to marshalSendAndReceive()
is done from an instance returned by a getter called in getCityForecastByZip()
, so the path starts at that instance:
./parsers/sum.py examples/data/calls.json <(echo '') <(echo 'marshal')
Output:
# Calls
### 1
```
WeatherConfiguration:weatherClient(Jaxb2Marshaller)
setMarshaller
setUnmarshaller
```
### 2
```
WeatherConfiguration:marshaller()
setContextPath
```
### 3
```
WebServiceTemplate:marshalSendAndReceive(Object)
marshalSendAndReceive
```
### 4
```
WebServiceTemplate:marshalSendAndReceive(String, Object)
marshalSendAndReceive
```
# Total
```
setContextPath:1
setMarshaller:1
setUnmarshaller:1
marshalSendAndReceive:2
```
By design, the graph traversal algorithm only considers path expansion until the source class/method matches. This is why we don't see calls starting from marshalSendAndReceive()
, so we would have to do additional queries to find them:
./parsers/sum.py examples/data/calls.json <(echo '') <(echo 'send')
Output:
### 241
```
WebServiceTemplate:marshalSendAndReceive(String, Object)
WebServiceTemplate:marshalSendAndReceive(String, Object, WebServiceMessageCallback)
WebServiceTemplate:sendAndReceive(String, WebServiceMessageCallback, WebServiceMessageExtractor)
WebServiceTemplate:doSendAndReceive(MessageContext, WebServiceConnection, WebServiceMessageCallback, WebServiceMessageExtractor)
WebServiceTemplate:sendRequest(WebServiceConnection, WebServiceMessage)
send
```
If there are no specific semantics in input calls (e.g. classes vs. methods), paths can be computed from the 2columns
parser:
printf '%s\n' '1 2' '2 3' '2 4' '4 5' | ./2columns/parser.py | ./sum.py '' '5'
- Java (via java-callgraph)
- C/C++ (via LLVM optimizer output in dot format)
- Java 1.8 (Java parser)
- Python 2.7
Add environment variables:
export JAVA_HOME=/c/Program\ Files/Java/jdk1.8.0_141
export PATH=$PATH:$JAVA_HOME/bin
Install python dependencies:
pip2 install -r requirements.txt
# or
python2 -m pip install -r requirements.txt
Some languages have dedicated helper scripts available:
If you have all your dependencies in a container file format (e.g. ear
or war
), copy them under one of the following filesystem paths:
./$target_path/$name/$name.ear
./$target_path/$name/$name.war
And parse them:
# Generates calls.txt using `java-callgraph`
./extract-calls.sh
# [OPTIONAL]: Generates source methods from wsdl files
./extract-methods.sh
Otherwise, if you don't want to generate a "fat jar / uber jar / shaded jar / maven assembly's jar-with-dependencies", the simplest way is to copy all dependencies to a directory:
output_dir=
project_dir=
cd "$project_dir"
mvn dependency:copy-dependencies -DoutputDirectory="$(realpath "$output_dir")"
mvn package
cp target/*.jar "$output_dir"
Then parse all dependencies under that directory:
# Generates calls.txt using `java-callgraph`
./extract-calls-in-dir.sh "$output_dir"
./parsers/java-callgraph/parser.py <(echo examples/data/calls.txt) > examples/data/calls.json
./parsers/sum.py examples/data/calls.json <(echo 'weather') <(echo 'setZIP')
Alternatively, to find paths across multiple projects:
# From source module `app` against multiple target modules, defined in script
./extract-summaries.sh $app
Create CFG:
clang -emit-llvm -c foo.c
opt -dot-cfg foo.bc
TODO