JuliaEcosystem/PackageAnalyzer.jl

Feature/docs idea: analyze dependency chain

ericphanson opened this issue · 14 comments

We make it easy to analyze packages by name or a whole registry. It would be cool to have helpers to also analyze other collections like “these packages and all their transitive deps” or “this registry and all its deps in other registries”. Though maybe this is better done by the user composing eg @mattBrzezinski’s PkgDeps and PackageAnalyzer instead of adding it in here. In which case maybe it would make a good docs example but no code changes here.

I'm not sure if PkgDeps.jl fits entirely well into this package, although they are relatively similar. I originally chatted with @omus who gave the idea for PkgDeps, and @oxinabox brought some good feature ideas for it too.

Do we have a Julia org around package information, if not maybe create one?

One good use case here would be to use the LicenseCheck.jl autodetection to assemble the set of all licenses used by your package's direct and recursive (indirect) dependencies.

So e.g. I put in as input recursive_licenses("MyPackageName"), and it spits out the output looking something like this:

3-element Vector{String}:
 "MIT"
 "BSD"
 "Apache"

This lets me know that I'm, for example, not using any packages that are under the GPL license.

Do we have a Julia org around package information, if not maybe create one?

This sounds like a great idea!

This sounds like a great idea!

Any suggestions for a name?

I am notoriously bad at naming things 😂

I would suggest things like:

  • JuliaPackageTools
  • JuliaPackageUtilities
  • JuliaEcosystem
  • JuliaPackageEcosystem

None of those are very good.... I'm sure someone else will have a snappier idea.

I kinda like JuliaEcosystem. JuliaPackage{Tools,Utilities} may give wrong expectations

I kinda like JuliaEcosystem. JuliaPackage{Tools,Utilities} may give wrong expectations

JuliaEcosystem might be a bit ambiguous, but I do like it. I assume org name changes can happen?

You can change the name of an org, but note that when you rename an org, it doesn't automatically redirect URLs from the old URL to the new URL.

That is, if you have a package:

  • https://github.com/JuliaFoo/MyPackage.jl

And you rename the JuliaFoo organization to JuliaBar, the old URL https://github.com/JuliaFoo/MyPackage.jl will break.

So, it's best to avoid changing organization names.

That being said, it is done occasionally, like when JuliaDiffEq renamed itself to SciML.

Huh. I could have sworn that, in the past, I ran into some problem with their redirect (in either JuliaRegistries/General#7516 or JuliaRegistries/General#11578)

Somehow something broke with the JuliaDiffEq -> SciML rename, and it broke the registry.

Anyway, my memory is fuzzy, so I don't remember exactly what the problem was.

Anyway it's probably a corner case.

If there's no objections to JuliaEcosystem, I can make it in the a.m. when I wake up and begin transfer a few projects into it!

returning to the OP, with PkgDeps v0.6 we can do

julia> using PkgDeps, PackageAnalyzer

julia> pkg = "PkgDeps"
"PkgDeps"

julia> analyze(find_packages(keys(dependencies(pkg))))
┌ Error: Could not find package in registry!
│   name = "Pkg"
│   path = "/Users/eph/.julia/registries/General/P/Pkg"
└ @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/h1ihS/src/PackageAnalyzer.jl:198
┌ Error: Could not find package in registry!
│   name = "Statistics"
│   path = "/Users/eph/.julia/registries/General/S/Statistics"
└ @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/h1ihS/src/PackageAnalyzer.jl:198
[...many more stdlib error logs emitted...]
3-element Vector{PackageAnalyzer.Package}:
 Package SHA:
  * repo: https://github.com/staticfloat/SHA.jl.git
  * uuid: ea8e919c-243c-51af-8825-aaa63cd721ce
  * is reachable: true
  * lines of Julia code in `src`: 615
  * lines of Julia code in `test`: 198
  * has license(s) in file: MIT, BSD-3-Clause
    * filename: LICENSE.md
    * OSI approved: true
  * has documentation: false
  * has tests: true
  * has continuous integration: true
    * Travis
 Package Compat:
  * repo: https://github.com/JuliaLang/Compat.jl.git
  * uuid: 34da2185-b29b-5c13-b0c7-acf172513d20
  * is reachable: true
  * lines of Julia code in `src`: 872
  * lines of Julia code in `test`: 755
  * has license(s) in file: MIT
    * filename: LICENSE.md
    * OSI approved: true
  * has documentation: false
  * has tests: true
  * has continuous integration: true
    * GitHub Actions
 Package TOML:
  * repo: https://github.com/JuliaLang/TOML.jl.git
  * uuid: fa267f1f-6049-4f14-aa54-33bafae1ed76
  * is reachable: true
  * lines of Julia code in `src`: 1088
  * lines of Julia code in `test`: 1359
  * has license(s) in file: MIT
    * filename: LICENSE
    * OSI approved: true
  * has documentation: true
  * has tests: true
  * has continuous integration: true
    * GitHub Actions

to analyze all the dependencies of a package, and likewise

using PkgDeps, PackageAnalyzer
my_registry = only(reachable_registries("MyRegistry"))
registry_deps = find_packages(keys(mapreduce(dependencies, merge, keys(my_registry.pkgs))))
analyze(registry_deps)

to analyze a registry's packages and all its dependencies in other registries.

I think we need a way to not warn on stdlibs not being found in registries in find_packages for this to not be annoying.

I also think it could be nice to have some standard analyses or something that users can easily run on a collect of packages. Or maybe that shouldn't be part of this package but something like a Pluto notebook where you can put in the set of packages you care about (e.g. from one of the above methods) and then an analysis runs.

I think v1.0's analyze_manifest might be a better approach; PkgDeps doesn't do version resolution (i.e. it has not re-implemented Pkg's resolver, which... seems fair), and since the set of dependencies of a package can change from version-to-version, with the Pkg deps approach it is hard to know that the results you get back are actually analyzing the code you would be using if you used the package. analyze_manifest, on the other hand, requires one to use Pkg to resolve a set of versions into a manifest, and then just analyzes those versions as-is without caring about their dependency relationship.

In other words, when using analyze_manifest the example from #49 (comment) of analyzing the dependencies of PkgDeps looks like

julia> using PackageAnalyzer
 
pkg> activate --temp

pkg> add PkgDeps

julia> analyze_manifest()
2-element Vector{PackageAnalyzer.Package}:
 Package PkgDeps:
  * repo: https://github.com/JuliaEcosystem/PkgDeps.jl.git
  * uuid: 839e9fc8-855b-5b3c-a3b7-2833d3dd1f59
  * version: 0.6.2
  * is reachable: true
  * tree hash: 1c3c2634c7a77c80e6922b6aa6e7a52b0132af71
  * Julia code in `src`: 340 lines
  * Julia code in `test`: 121 lines (26.2% of `test` + `src`)
  * documentation in `docs`: 0 lines (0.0% of `docs` + `src`)
  * documentation in README: 36 lines
  * has license(s) in file: MIT
    * filename: LICENSE
    * OSI approved: true
  * has license(s) in Project.toml: MIT
    * OSI approved: true
  * has `docs/make.jl`: false
  * has `test/runtests.jl`: true
  * has continuous integration: true
    * GitHub Actions
 Package Compat:
  * repo: https://github.com/JuliaLang/Compat.jl.git
  * uuid: 34da2185-b29b-5c13-b0c7-acf172513d20
  * version: 3.46.0
  * is reachable: true
  * tree hash: 78bee250c6826e1cf805a88b7f1e86025275d208
  * Julia code in `src`: 1518 lines
  * Julia code in `test`: 1295 lines (46.0% of `test` + `src`)
  * documentation in `docs`: 0 lines (0.0% of `docs` + `src`)
  * documentation in README: 236 lines
  * has license(s) in file: MIT
    * filename: LICENSE.md
    * OSI approved: true
  * has `docs/make.jl`: false
  * has `test/runtests.jl`: true
  * has continuous integration: true
    * GitHub Actions