feddischson/include_gardener

Python language support

Closed this issue · 13 comments

At the moment, searching for C/C++ files and the preprocessor include statements is hard-coded via two regular expressions.

Nevertheless, it would be nice to support other languages.
For that, the first step can be supporting Python.

The handling of the regexes for selecting the files and the include statements needs to be done in a modular and re-usable way.

Do you mean, in the case of Python, that you would analyse the "import" statements or are I misinterpreting you?

Yes, analyse the "import" statement is the idea.

I think it is a good idea to make a design which allows an extension for almost any language. Python and C/C++ would be one specialization.

But not only the "import" statement needs to be considered, also the file-pattern which is used for the search needs to fit (see also #3)

Not impossible to do, some extra command line arguments might work, use a vector<string> to store the file types that are to be read, perhaps use Abstract Factory to churn out the needed functionality.

Would you be interested in implementing it?

@feddischson Please elaborate the concepts that are to be implemented using python . Maybe I can do that .

@abbyck Thanks for your interests.
On a first thought, I came up with the idea to handle this in a generic way via configuration. Just providing different regular expression to detect the different include statements for different languages.

However, the concept of python has strong differences to the concept of C/C++:

  • The import statements can have different structures, e.g. from x import y or just import z
  • A dot is used as separator, e.g. import a.b.c
  • __init__.py files need to be considered when dealing with from x.y import * (see __all__ statement)
  • PYTHONPATH needs to be considered somehow (but could be done via -I option)

Because of this, it could make sense to re-structure the code in order to implement different parsing functionalities with different behaviour for each language. And for the future, it would be nice to support even more languages.
To define an improved architecture, it would be good to consider multiple-languages now to avoid high restructuring afterwards.

If we look at http://githut.info/ and https://madnight.github.io/githut/, it would be nice to support at least

  • java-script
  • java
  • python
  • css
  • php
  • ruby
  • C#
  • Go
  • Scala
  • Objective-C

I see the following steps to reach this goal:

  • Creation of example files, which cover all include/import possibilities for each language in test/test_files.
  • Analysis and categorization of the languages. For example objective-c behaves very similar to c/c++ (due to its roots).
  • Definition of an improved architecture
  • Implementation of individual languages

If you would like to support this tool, just start with step 1 and pic one (or multiple) language(s) for which you could create a set of example files. I've created a new branch for these activities: https://github.com/feddischson/include_gardener/tree/multi_language
Please let me know in which languages you are interested.

@feddischson The task of adding support for abstracting the architecture to handle multiple languages is something I am interested in. I would undertake the four steps you outlined (one at a time) for java-script, java and probably python, to start. Currently I am traversing the source code and trying the tool out on some C projects. Is this language support issue still something that is desired and needs to be implemented?

@d3v-nu11 Yes this is still desired.
The goal of the example files would be to cover all "include"-scenarios so that the example files can also be used for testing purposes.

@feddischson Concerning JavaScript, what is the scope of support for this tool intended to be? Is it just intented to detect "includes" from static HTML files? If not, prior to ES6 (which is not fully supported on many browsers), modules are available only in external libraries. This link I believe outlines the varied ways across ES5 (CommonJS/Node.js, RequireJS) and ES6. http://exploringjs.com/es6/ch_modules.html#sec_modules-in-javascript

@d3v-nu11
I am not a JS expert, but after reading through your article I would propse to support all three ways to import JS modules:

  • ES5 via CommonJS
  • ES5 via RequireJS
  • ES6

But your first question is even more interesting: To consider HTML files or not?
I see the following user story:

  • A web developer wants to get an overview which files are using which other files (focus on JS)
  • The tool is used to parse all HTML and JS files and creates a graph
  • The graph is visualized

So I think it would really be a benefit to consider also HTML files when creating a include-dependency tree.
The same applies also for CSS.

@d3v-nu11 How far did you get?

@feddischson Not far. Should have posted earlier- I've decided to use what spare time I have for discovering hobbies that I don't already do for 40+ hours a week :) That being said, I still think the task is valid and that if someone wanted to they could break it down even further into sub-tasks. Good luck.

This poing is closed due to the fact, that the re-structuring of the code to support multiple languages is done.
I will create some individual issues for further languages.