scalameta/metabrowse

Problems generating a metadoc site for the community build

jonas opened this issue · 2 comments

jonas commented

Here are some findings from using the ZIP file of semantic DBs mentioned in #22 (comment) to generate a metadoc site.

  • Duplicate semantic DBs: the Scalameta community build contains several duplicates semantic DBs for the same path under META-INF. Not sure if this can be sanely handled by metadoc, but the problem of duplicate definitions could still apply if metadoc where to support multiple versions of the same project/file, e.g. Akka 2.4 and Akka 2.5.

  • Memory usage: The CLI currently loads all semantic DBs into memory and generates additional datasets for files and symbols in memory before writing anything out to disk. I was not able to generate a metadoc site for a significant subset of semantic DBs from the community build:

      > du -sh community/classes/
      581M	community/classes
    
  • File name too long: The current symbol name encoding scheme does not work on file systems with modest path lengths.

Very interesting. Thanks for summarizing.

  1. Is this maybe caused by JS cross-compilation? I noticed many projects in the CB are cross-compiled. We should definitely handle this, since the same name could resolve to different symbols on JS/JVM.
  2. I suspected memory usage would become an issue. There is no reason for metadoc to load the entire semanticdb into memory, we can build update the index one entry at a time. I have the same problem in scalafix so I will try to address this in the not too far distant future.
  3. Long filenames for symbols are solved in #53 right?

One more observation was that debugging crashes was difficult right? That is fixable and has been on my todo a while.

jonas commented

Yes, the issue with long file names has been addressed.

Debugging is hard, but would be fixable by loading each semantic DB separately so we can report errors. But it would be useful to also improve the exceptions thrown by scalameta.

Another finding was that for large sites we need some sort of status reporting. For the community build it might take in the order of 20-30 minutes to crunch the DBs and it would be nice if progress was printed on some way.