mozilla/dxr

Java support.

Closed this issue · 12 comments

DXR has plugins for C/C++, JavaScript, Python and Rust that enrich the index with additional type information, but it doesn't have any plugin for Java yet.

In general, there are two approaches for indexing Java code structures:

  1. Parse the source code.
  2. Compile the source code and scan the Java byte code.

Advantages of each of the approaches are mentioned here.

I would opt for the source code approach, because it introduces less overhead (no compilation needed). This could be achieved with javaparser or antlr.

Yes, parsing the source code is probably the way to go, not least because I imagine bytecode loses things we care about, like names and certainly file offsets for links and menus. Java is a static enough language (no preprocessor) that we could probably do a pretty decent job without actually building Java projects. Have a go at a plugin! http://dxr.readthedocs.org/en/es/development.html#writing-plugins

To give you a short update:

Using javaparser and largely adopting the structure of the JavaScript plugin, I was already able to finish most parts of the plugin (i.e. DXR runs locally with both definition and reference information on Java pacakge, classes/interfaces/annotations/enums, methods and variables/fields/constants). That said, I still need to finish a Python integration test like this one and polish the qualified names like here.

Regarding these qualified names, I have a few questions:

  1. What is the correct qualified name syntax: parentName#tokenName or parentQualifiedName#tokenName? And is it the same for def[inition]s and ref[erence]s?
  2. What is the correct qualified name for cases where the parentName (or parentQualifiedName) is not clear (e.g. not easily determinable): #tokenName or tokenName?
  3. Say a method AbstractExample#bar is called from AbstractExample#barThis and Example#barSuper (where Example extends AbstractExample). And say we can't easily determine (see pt. 2) that the called method #bar in Example#barSuper is in AbstractExample. Then we would get different qualified names for these calls, i.e. #bar (in Example#barSuper) and AbstractExample#bar (in AbstractExample#barThis). My hope would be that it is possible (a) to find both calls from the def and (b) the def from both calls. However, I fear I might be too optimistic here. In this case I would rather prefer to omit the qualified names for now, at least in cases where I cannot safely determine qualified names for all occurrences of a kind yet. What do you think?

On another topic, I suppose that I should adhere to Mozilla Coding Style for Java when committing Java code. Am I right? If so, do you know who I could ask about how to assert this in a make lint target? While this should be possible with checkstyle using command line, it would be easier and more consistent to use the same approach used for other Java code at Mozilla.

Hi @caugner , the qualified name convention in the JS plugin is unique (nowhere else uses a #) since I wanted to err on the side of too-inclusive results when possible because of lack of import rules and the possibility that items are rebound on different names.
I'd look at the clang version for inspiration where the qualified name goes along Scope::FunctionName(args...) and should work for Java too.

We try to match our qualified name spelling with whatever the language convention is so users of that language can just type what they're accustomed to. So for C++ it's a::b::c. For Python, it's a.b.c. If I remember my Java correctly, it would look like Python.

Working through your other questions…

  1. Yes, it's the same for defs and refs.
  2. Just tokenName is usual, but if you can determine that it's a choice of just a few parents somehow, you could consider returning each of them by yielding multiple needles.
  3. Yes, it's probably safest to not emit qualified names if you're not very sure of them. In fact, you should even omit the qualname field from your plugin's mapping if you're not going to use it at all. (See dxr.plugins.xpidl.mappings for an example, which you could even factor up into dxr.indexers.)

And yes, it's probably safest to adhere to Mozilla's Java style guidelines, though I've never written Java at Mozilla and am unsure what tooling they use. You could ask in #developers on irc.mozilla.org.

I'm excited about your plugin!

Thank you both for your feedback! Just a short update to let you know I'm constantly working on this:

Regarding qualified names, I decided to follow the patterns used in Javadoc @links:

<packageName>.<TypeName>#<methodName>(<ParameterType1>, <ParamType2>, ..., <ParamTypeN>)

For tokens that are not linkable in Javadoc (such as local variables), I choose analogous patterns:

<qualifiedNameOfMethod>#<variableName>[-<numberOfInstance>]

The trickiest part certainly is to determine the correct qualified name for method calls, especially when inheritance and overloading come into play. Therefore I need to maintain variable scopes within and type hierarchies across compilation units.

Glad you're still at it!

The trickiest part certainly is to determine the correct qualified name for method calls, especially when inheritance and overloading come into play. Therefore I need to maintain variable scopes within and type inheritance across compilation units.

Yes, you end up writing the front end of a compiler over again. You can see why some plugins (clang) instead choose to exfiltrate information from the actual compiler.

This turned out to be much more trickier than expected. I will have to prioritize and therefore add support for different features at different times and roughly in the following order:

  • Type declarations (classes, interfaces, annotations, enums)
  • Type references
  • Package "declarations" (package ...)
  • Package references (import ...)
  • Method declarations within the scope of types w/o signature differentiation
  • Method references within the scope of types w/o signature differentiation
  • Field/Variable declarations within the scope of types w/o type differentiation
  • Field/Variable references within the scope of types w/o type differentiation
  • Determine the type of expressions
  • Differentiate method signatures
  • Support inheritance
  • Support method references across types
  • Support field/variable references across types

I'm not surprised. I'm happy to ship things in little chunks as you get them done.

Hi @caugner
I'm also very interested in a Java version. Instead of using Javaparser, you may consider Spoon which gives you declarations and references out of the box. I can provide you with support about spoon.

Hi @caugner , any chance you are still working on this? If not, I might want to give it a try

@erikrose @dbacarel Sorry about the delay. I did indeed give up on this, because I haven't been working with Java recently.

This said, Spoon (as suggested by @monperrus) looks like a promising tool to make this happen.