/PyEGo

PyEGo: environment dependency inference for Python programs

Primary LanguagePythonMIT LicenseMIT

PyEGo

Description

PyEGo: Inferring Environment Dependencies for Python Programs

Introduction

PyEGo is a tool of automatically inferring environment dependencies for Python programs.
A Python program's environment dependencies mainly consists of three parts:

  • Compatible Python interpreter version;
  • Dependent Python third-party packages;
  • Dependent System libraries.
    For example, the following snippet print emoji on the terminal:
import emoji
print emoji.emojize('Python is :thumbs_up:')

This snippet is only compatible with Python2, because there are no parentheses after "print". If we run the snippet in Python3:

$ python example/example.py
File "example/example.py", line 2
  print emoji.emojize('Python is :thumbs_up:')
          ^
SyntaxError: invalid syntax

On the other hand, the snippet depends on a Python third-party package emoji. If we run the snippet without installing emoji:

$ python example.py 
Traceback (most recent call last):
  File "example/example.py", line 1, in <module>
    import emoji
ImportError: No module named emoji

PyEGo can build a runtime environment for the snippet:

$ python PyEGo.py -r example/example.py

And then, output a Dockerfile:

FROM python:2.7
RUN apt-get clean
RUN apt-get update
RUN pip install --upgrade pip
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install emoji==0.6.0
ADD example.py example.py
# add CMD command to run your programs here

Add CMD instruction to run the snippet, build docker image:

$ echo "CMD python example.py" >> example/Dockerfile
$ cd example
$ docker build -t ego .

Now, run it!

$ docker run ego
Python is 👍

Installation

Install local

  • Install Python>=3.6
  • Install dependent Python packages:
$ pip install -r requirements.txt
  • Install NEO4J>=3.5.13, <4
  • Merge PyKG: Our knowledge graph, PyKG, is split into 2 files because of file size limit, merge them before load it:
$ cat PyKG/PyKG.dump.a* >> PyKG.dump
  • Load database(PyKG):
$ cp PyKG.dump /PATH/TO/NEO4J/data/databases/
$ cd /PATH/TO/NEO4J
$ bin/neo4j stop
$ bin/neo4j-admin load --from=data/databases/PyKG.dump
  • Config PyEGo
    Edit config.py, config neo4j connection:
NEO4J_URI = "YOUR NEO4J URI"
NEO4J_PWD = "YOUR NEO4J PASSWORD"

Docker

We also provide a Docker image of PyEGo. Build Docker image by:

$ docker build -t ego -f Docker/Dockerfile .

Instructions

Local

Start neo4j before running PyEGo:

$ cd /PATH/TO/NEO4J
$ bin/neo4j start

If you installed PyEGo local, you can use PyEGo by:

$ cd /PATH/TO/PyEGo
$ python PyEGo.py [-h] [-t OUTPUT_TYPE] [-p OUTPUT_PATH] -r PROGRAM_ROOT
             
  • Program root can be either a single .py file or a Python project folder.
  • PyEGo provides two types of output: Dockerfile, and dependency.json. Default output type is Dockerfile.
    For a Dockerfile output, set --output_type=Dockerfile(-t Dockerfile), and for a json output, set --output_type=json.
  • --output_path(-p) indicate the output path of the Dockerfile or dependency.json. PyEGo generates the file in the parent folder of PROGRAM_ROOT by default. For more help, see:
$ python PyEGo.py -h

Docker

If you built Docker image of PyEGo, you can use PyEGo by:

$ docker run -v /PATH/TO/PROGRAM/ROOT:/INPUT/IN/CONTAINER \
             -v /PATH/TO/OUTPUT:/OUTPUT/IN/CONTAINER \
             ego /INPUT/IN/CONTAINER /OUTPUT/IN/CONTAINER

Replay our experiment

Experiment on Hard-gists

Experimental results are available in another repository, exp-gist.

Run PyEGo on Hard-gists

  • Edit experiment/exp_config.py, config hard-gists root
EGO_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PYEGO"
  • Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --run

Compare PyEGo results with DockerizeMe and Pipreqs

  • Run DockerizeMe and Pipreqs
    We provide our experiment bash script of DockerizeMe and Pipreqs
    script/dockerizeme_gen_df.sh uses DockerizeMe to generate Dockerfiles for gists. Note that run the script in DockerizeMe vagrant(Provided by DockerizeMe)
# Run the script in DockerizeMe vagrant
# Edit line2: cd /YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME
$ cd /PATH/TO/PyEGo/script
$ bash dockerizeme_gen_df.sh

script/pipreqs_gen_df.sh uses Pipreqs to generate requirements.txt and Dockerfiles for gists. Note that run the script after install pipreqs(pip install pipreqs) in Python2.7

# Edit line2 and line3: /YOUR/HARD/GISTS/ROOT/OF/PIPREQS
$ cd /PATH/TO/PyEGo/script
$ bash pipreqs_gen_df.sh

script/dockerize_all.sh builds Docker images by DockerizeMe-generated or Pipreqs-generated Dockerfile, runs Docker containers, checks results and records results in log.txt.

# Edit line2: cd /YOUR/HARD/GISTS/ROOT
$ cd /PATH/TO/PyEGo/script
$ bash dockerize_all.sh
  • Edit experiment/exp_config.py, config hard-gists root and log path
EGO_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PYEGO"
ME_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME"
REQS_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PIPREQS"

EGO_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/PYEGO"
ME_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/DOCKERIZEME"
REQS_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/PIPREQS"
  • Compare results
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --compare

Experiment on Github dataset

Results of experiments are available in another repository, exp-github.

Download dataset

Our dataset is available on https://drive.google.com/file/d/1oHr6mbm0d5jIlVxeDkY6iyvow_Q63L_w/view.

  • Unzip dataset:
$ tar -xvf GithubProjects.tar.gz
  • Make copies for experiments:
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/EGO
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS

We need some copies of the dataset for our experiments. It's OK to use only one copy, but results would be overwriten.

Run PyEGo on Github dataset

  • Edit experiment/exp_config.py github dataset root
EGO_GITHUB_ROOT = "/YOUR/GITHUB/DATASET/ROOT/OF/EGO"
  • Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=PyEGo

Compare PyEGo results with DockerizeMe and Pipreqs

  • Run pipreqs
    Install pipreqs in Python3.6+ Edit experiment/exp_config.py, config github dataset root and pipreqs path
REQS_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS"
PIPREQS_PATH = "/YOUR/PIPREQS/PATH"

You can simply find pipreqs path by

$ which pipreqs

Run pipreqs

$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=Pipreqs

We provide results of DockerizeMe in exp-github.

  • Edit experiment/exp_config.py, config github dataset root and log path
EGO_GITHUB_ROOT = "/YOUR/GITHUB/DATASET/ROOT/OF/EGO"
REQS_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS"
ME_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/DOCKERIZEME"

EGO_GITHUB_LOG = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/EGO"
REQS_GITHUB_LOG_39 = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS"
ME_GITHUB_LOG_39 = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME"
  • Compare results
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --compare

Experiment running PyEGo with different strategies

Results of experiments are available in exp-gist.

  • Here are our 2 strategies:
id select strategy
1(default) select-one
2 select-all
  • Edit experiment/exp_config.py, config hard-gist root
EGO_GISTS_ROOT = "/YOUR/HARD/GIST/DATASET/ROOT/OF/PYEGO/STRATEGY1"
EGO_GISTS_ROOT_2 ="/YOUR/HARD/GIST/DATASET/ROOT/OF/PYEGO/STRATEGY2"
  • Run strategy on Hard-gists:
$ cd /PATH/TO/PYEGO
$ python experiment/tests_strategies.py --strategy=X