crates index metrics

This project is intendet to collect DX as well as Ops data for crates on an index (doesn't have to be crates.io).

It's intention is to monitor an index for ease of use, security aspects, changes in dependencies and much more. Not using artificial tooling but rather gathering data based on real tools and applications.

Status

This is currently in planning, a small poc based on collecting data about cargo-binstall is currently available inside this repository.

What it includes:

Creating a map of all crates and versions of an index
Downloading the crates file and parsing it's Cargo.toml
Using streaming and semaphores to reduce the amount needed for compute, ram and to adapt for greater local ressources
Gather raw data and collect key metrics based on this. > This currently means, gathering data and calulating the amount and names of crates supporting cargo-binstall

The greater plan

What currenty is wrong:

As of right now, all data is downloaded as you go, this means reevaulating means a new download => files should be cachable
Errors are just spit out the console, there is no gathering about the process or plain supervision
Everything (except downloading the crates file) is a one shot operation

What it should provide in structure:

It should adhere the follwing tree strucutring
- Crate -> the crates name
  - Version -> the crates version
    - Version checksum -> the crates checksum
      - Feature -> hashes of Vec<String> as its storted by asc alph, asc len
        
        Target -> the platform targetet
        
        Package -> bins or libs of crate

What should trigger a new check:

A new crate
A new check
A new target
An updated checksum of a version
Updated dependencies -> From a lock file perspective, not one of a version specification in Cargo.toml

What should be checked:

tools
- cargo udeps
- cargo geiger
- cargo tarpaulin
- cargo build
- cargo audit
- cargo-bloat
questions:
Does it just build?
How long does it build?
How big are builds?
How many dependencies does it have?
How many features does it have?
Does it have any runtime dependencies?
Is its source code public? And still accessible
Are all dependencies up to date?
Are some dependencies using yanked crate versions?
Does it support cargo-binstall?
Is the maintainers email address valid?
How many binaries does it have?
does it use openssl?
On which host is the source code?
Does the cli use clap?
Is it wasm capable?
How much of its code is unsafe? Is unsafe code forbidden
How many test cases does it have?
What benching framework does it use?
Do all feature combinations work?
Does it use rust 2021
Does it use dependabot or renovate?
Is there more than 1 maintainer?
Are dependency versions clearly defined?
Does it support no_std?
What features do use up the most binary size?
Are binaries optimized?

somehowchris/rust-crates-index-metrics

crates index metrics

Status

The greater plan