/BigSpa

A framework for large-sacle static program analysis.

Primary LanguageJava

BigSpa

A framework for large-sacle static program analysis. The master branch is in version 1.0-SNAPSHOT.

Prerequisites

As BigSpa is built on the big data processing platform Spark, distributed file system HDFS and distributed in-memory database Redis, you need to get these installed first.

Building BigSpa from Source

We use Maven to build our project. To build BigSpa you need:

  • Unix-like environment
  • Git
  • Maven
  • Java 8 or 11
git clone https://github.com/PasaLab/BigSpa.git
cd BigSpa
mvn clean install -DskipTests

Run BigSpa

To perform offline batch or online incremental static program analysis using BigSpa, you can write scripts in the following format.

spark-submit \
--master yarn \
--deploy-mode client \
--name TASK_NAME \
--class Redis_pt \ # main function
--num-executors 16 \
--executor-cores 24 \
--executor-memory 16G \
--conf spark.storage.unrollMemoryThreshold=10000000 \
--conf spark.locality.wait.time=1ms \
--conf spark.locality.wait.node=1ms \
--files data/pasa.conf.prop \
--driver-memory 16G \
--driver-class-path /home/user/class/path \
 \
BigSpa-1.0-SNAPSHOT-jar-with-dependencies.jar \ # run the jar package
 \
islocal,false \
master,hdfs://master:9001/ \
input_graph,/path/to/graph/data/InputGraph/hdfs_pointsto \ # the input graph data
input_grammar,/path/to/grammar/data/alias-complete.grammar \ # grammar
output,/BigSpa_Output/result/hdfs_pt_Redis \
checkpoint_output,hdfs://master:9001/BigSpa/checkpoint1 \
updateRedis_interval,500000 \
queryRedis_interval,50000 \
 \
defaultpar,1152 \
clusterpar,384 \
newnum_interval,100000000 \
checkpoint_interval,20 \
 \
file_index_f,0 \ # for Linux data input
file_index_b,12 \
 \
check_edge,false \
outputdetails,false \
output_Par_INFO,false \
 \
Split_Threshold,1000000 # threshold of node split

The script can be run in the directory where the JAR package is located. For a description of the importent parameters, see the table below.

Parameter Description Value for Reference
Spark script params master yarn yarn
deploy-mode running mode client
name Spark APP Name BigSpa.offline.psql.df
class main function OFFLINE.Redis_pt
num-executors number of executors 16
executor-cores number of cores of each executor 24
executor-memory memory of each executor 16G
conf params of Spark or Java
files file path for Redis data/pasa.conf.prop
driver-memory 16G
driver-class-path
General parameters islocal whether to perform local debugging FALSE
master HDFS address hdfs://master:9001/
input_graph file path of the input graph /data/linux.pt
input_grammar file path of the grammar /data/grammar
output output path /BigSpa/result/hdfs_pt_Redis
checkpoint_output save path for checkpoint files hdfs://master:9001/BigSpa/checkpoint
updateRedis_interval batch size for updating Redis 500000
queryRedis_interval batch size for querying Redis 50000
defaultpar default number of partitions 384/768/1152
clusterpar number of partitions in the cluster(num-executors*executor-cores)
newnum_interval threshold for automatically add partitions
checkpoint_interval cut off the lineage after how many iterations
file_index_f for Linux database, used to decide which files to merge and execute 0
file_index_b for Linux database, used to decide which files to merge and execute 12
check_edge whether to output edge information FALSE
output_Par_INFO whether to perform automatic partition adjustment TRUE
param for node split Split_Threshold when the number of predicted collars exceeds$*16. split the node
params for computation closure input_e as E described in the paper
input_n as N described in the paper
is_complete_loop whether to perform local closure operations true
original_loop_turn number of small local closure execution rounds 5
max_loop_turn number of large local closure execution rounds 100
convergence_threshold conditions for executing large local closures: when the number of new edges generated per round is less than $, large local closures are performed 10000
params for incremental updates changemode_interval calculation mode update threshold: when the number of new edges per round reaches $, switch from stand-alone to distributed 50000
add path of input batches data/httpd.pt.batch/batches