/PyReprism

PyReprism is a suite of essential methods designed for common preprocessing tasks in code clone detection research.

Primary LanguagePythonMIT LicenseMIT

License: MIT Downloads FOSSA Status FOSSA Status Documentation Status PyPI - Python Version CI Publish codecov PyPI - Version GitHub last commit (branch) SWH

PyReprism

PyReprism is a Python framework that helps researchers and developers the task of source code preprocessing. With PyReprism, you can easily match, extract, count, and remove comments, whitespaces, operators, numbers and other language specific constructs from over 150 programming languages and file extensions.

Install

pip install PyReprism

Quick Usage

Use case 1: Removing comments

from PyReprism.languages import Python
# from PyReprism.languages import Java

source = """
# single line comment
x = 5 + 6
'''
multiline
comment
'''
print(x)
"""

source = Python.remove_comments(source)

# expected output

x = 5 + 6


print(x)

Use case 2: Removing whitespaces

from PyReprism.utils.normalizer import Normalizer
source = """

x = 5 + 6


print(x)

"""


source = Normalizer.remove_whitespaces(source)

# expected output
x=5+6
print(x)

Read the docs for more usage examples.

NB: The beta versions of PyReprism is still unstable, but we are working 24/7 to ensure the tool is usable.

How to Contribute

We invite you to help us build this tool and make it more extensive. Contribution is open to OSS community.

$ git clone https://github.com/unlv-evol/PyReprism.git
$ cd PyReprism

(Optional) It is suggested to make use of virtualenv. Therefore, before installing the requirements run:

$ python3 -m venv venv
$ source venv/bin/activate

Then, install the requirements:

$ pip install -r requirements.txt

For more information on how to contribute, read our contributing guidelines.

Issues

If you experience any issue, feel free to report it.