jlachowski/clonedigger

Support python 3

Opened this issue · 11 comments

Support python 3

Work in progress

Any update on this?

I think this is really relevant. Has the original author been contacted? Maybe he could give a help on that.

+1

@marciomazza

Has the original author been contacted? Maybe he could give a help on that.

The original author abruptly lost interest in the project as soon as it went through Google Summer of Code many years ago. Maybe he just didn't feel motivated to continue developing the project without financial stimulus. At least I didn't get any answer from him to this particular question about an eventual Python 3 port neither years ago nor this autumn.

Unfortunately for us end users, there are a lot of prerequisites to be met for a Python 3 port of CloneDigger to happen. Since I have spent a few weekends investigating this (without much success), I will try to outline a few of them:

  • anyone willing to try porting (or maybe even implementing CloneDigger from scratch, more below why this is a tempting option as well) has to have a solid (I suppose high school) math background, to be able to read the whitepaper and understand it

  • the codebase is in a horrible state:

    • indentation is broken in some places, it's a mash of tabs and spaces (and it was even worse, see for example revision 187 in the sourceforge svn repo)
    • function/method names are CamelCase
    • heavy use of global variables and inner functions that will require a lot of refactoring
    • high cyclomatic complexity, a lot of nested conditions and loops, this will require refactoring as well
    • lack of unit tests, which will make the refactoring even harder, the only (very unreliable test) would be to run CloneDigger on itself after every change, to check that nothing broke fatally
    • optparse module is already deprecated (not a big deal really, converting to argparse is doable)
    • compiler module is already deprecated
    • it uses a bundled ancient version of logilab-astng that has no 'version' string anywhere; it's also apparently patched manually in many places so it can fit into the clumsy import approach (as I understand, it is injected into the main namespace manually). it seems that logilab-astng went through some API changes since then, because just installing a newer version and switching to it produces errors, namely related to as_string() methods of AST node objects. the whole logilab-astng package is deprecated, it's called astroid nowadays
    • there are differences in the Python 3 grammar so anyone who will attempt to port has to have a strong background in working with ASTs, maybe https://greentreesnakes.readthedocs.io/en/latest/ could be of any help

I've not used Clone Digger in a while, but I'd suggest using another tool.

How about pylint?

pylint --disable=all --enable=duplicate-code src/

https://julien.duponchelle.info/python/detect-python-code-duplicate

At least it's a popular (4.3m downloads last month) and well maintained (last released on Thursday).

ygorg commented

To support python3 clone digging (not executing clonedigger with python3).
IMO an easier way than rewriting python_compiler.py using parser module.
Is to use ANTLR (as java, js ans lua are processed) for python.
There is a grammar here and many more here.

But I cannot figure out how to use it the way java grammar is used.
Which is writing a TreeProducer.java file which output can be used by ExpatHandler.
This requires a deep dive into ANTLR.

ygorg commented

I made a fork (here) in which I started some documentation (here).
Also I made a class diagram (all the types are not set) which gives a pretty good sense of what is happening here.

ygorg commented

@somospocos Sorry, I didn't see your reply I updated it.

hi guys
i made clonedigger for py3
feel free to pull it from my git
github.com/slavanorm/clone_digger_py3