Source code analysis with PyCG
jonathanooikwanw opened this issue · 5 comments
An update: Is it possible to use PyCG to analyze source code of large repositories such as Tensorflow, Pytorch and Numpy?
Please see: #8
@ashwinprasadme sorry I don't understand, is it not possible then to analyze large source code repositories?
@jonathanooikwanw currently PyCG does not have the instrumentation to analyze external libraries. This does not mean large source code repositories. But rather how PyCG handles external imports, detailed in the issue linked above. Currently work is under progress to handle these as well.
@jonathanooikwanw Maybe this may help you.
I had forked and added a new branch called "output-line-number", which includes changes for generating function call line numbers, add --dir argument and scan whole directory, skip broken python files from analysing.
You can try to use it by typing:
$ pycg --dir [the analyzable directory absolute path] -o [output path]
Result is:
- newly generated output path with the ending "_pycg"
- generated JSON files with the source file plus with the extension of ".json".
Please note: we've tested this branch for our local files and we still didn't add tests. So I'll be happy if you try it and tell me if any error is occured.
The only problem I faced is:
- by running in medium performance server the PyCG may cause CPU or memory overwork and the OS may stop the script or the script will freeze after analyzing some portion of the files. However, I succeeded to analyze about 40000 files (all together 400MB).