/AlphaTrans

Repository-Level Compositional Code Translation and Validation

Primary LanguageJavaUniversity of Illinois/NCSA Open Source LicenseNCSA

AlphaTrans

This repository contains artifacts of AlphaTrans from the paper "Repository-Level Compositional Code Translation and Validation".

Getting Started

We provide a Dockerfile which installs all necessary dependencies to reproduce the results of AlphaTrans. Please execute the following to create a docker image and execute the container in interactive mode:

docker build --no-cache -t alphatrans .
docker run -it alphatrans bash

Please refer to Reproduce AlphaTrans Results for instructions on how to reproduce the results of AlphaTrans. If you are interested in translating more projects, please refer to Translate New Java Projects.

Reproduce AlphaTrans Results

AlphaTrans currently supports prompting OpenAI models (e.g., GPT-4o-2024-11-20) and open-source models (e.g., deepseek-ai/deepseek-coder-33b-instruct) served by ollama (please see the Ollama Project on how to start an engine). We have created a .env file to store API keys and model endpoints. If prompting with ollama, please simply paste in your OLLAMA_HOST (e.g., http://0.0.0.0:5000 when the engine IP is 0.0.0.0 and PORT is 5000). If prompting with OpenAI models, you only need to paste in your key in OPENAI_API_KEY.

vim .env

For all ten projects, we provide the project skeletons and partial translations. Please execute the following to start translating projects (e.g., commons-cli with deepseek-coder-33b-instruct model with temperature=0.0):

bash scripts/translate_fragment.sh commons-cli 0.0 deepseek-coder-33b-instruct

This script will translate the project fragment by fragment in reverse-call graph order and store translations in JSON files along with validation results (e.g., syntactical correctness, GraalVM correctness, test execution correctness, etc.). If you want to create standalone python projects, simply recompose all translations with the following script:

bash scripts/recompose.sh commons-cli 0.0 deepseek-coder-33b-instruct

Translate New Java Projects

If you are interested in building on top of AlphaTrans and add more projects, please follow the following steps:

1. CodeQL Database Creation & Static Analysis

AlphaTrans requires CodeQL CLI for database creation and static analysis. We already install CodeQL using Docker. We also clone the vscode-codeql-starter repository required for executing CodeQL queries. Please follow the steps below to create project database and execute queries:

  1. Place your Java project in <project_directory>. The <project_directory> can be java_projects/original_projects in AlphaTrans root.
  2. Create project database with CodeQL. Please see create_database_java function in setup.sh as reference.
  3. We have already copied all CodeQL files from queries directory into the vscode-codeql-starter/codeql-custom-queries-java directory. cd into this directory and execute bash execute_codeql_queries.sh <project_name> <database_name> <output_path>. Please see run.sh for reference.
  4. Once all queries are executed, query outputs will be stored under data/<output_path>.

2. Program Transformation

Execute the following from the root directory of the repository to perform program transformation on the projects.

bash scripts/program_transformation.sh <project_dir> <project_name>

3. Program Decomposition

Source Decomposition

Execute the following for source decomposition from the root directory of the repository.

bash scripts/create_schema.sh
bash scripts/extract_call_graph.sh

Test Decomposition

Execute the following for test decomposition from the root directory of the repository.

bash scripts/decompose_test.sh

4. Type Translation

Execute the following from the root directory of the repository to perform type translation on the projects.

bash scripts/extract_types.sh
bash scripts/crawl_type_desc.sh
bash scripts/translate_types.sh <type>

The <type> can be either simple or source_description. The former prompts the model with vanilla prompt, while the latter prompts the model with source PL type description.

5. Skeleton Construction

Execute the following from the root directory of the repository to generate skeletons of projects and check their syntactical correctness

bash scripts/get_dependencies.sh
bash scripts/create_skeleton.sh

This command should create proper skeletons in target language under data/skeletons/<project_name>.

Execute the following from the root directory of the project to run the Graal-based semantic check of generated skeletons.

python src/compositional_glue_tests/semantic_check.py --project <project_name> [--class=<class_name>] [--method=<method_name>]

If a pom.xml does not already exist for the project, the script will copy the original pom.xml to the project directory and throw an exception. You are required to manually check that the Java version in the pom.xml is set to at least 8 and that GraalVM is included in the dependencies. Once this is done, you can run the script again.

6. Compositional Translation and Validation

Execute the following from the root directory of the repository to perform compositional translation and validation on the projects.

bash scripts/extract_coverage.sh <project_name>
bash scripts/translate_fragment.sh <project_name> <temperature> <model>

Contact

We look forward to hearing your feedback. Please open an issue for any questions or comments 🙏.