json-schema-org/community

Proposal: JSON Schema Ecosystem Metrics

benjagm opened this issue · 16 comments

Background and Rationale

JSON Schema is a fundamental technology massively used in the industry, however it is not easy to measure its usage and adoption because is not a tool or service, instead is a specification with hundreds of implementations and countless use cases. This difficulty to measure the Ecosystem can affect our ability to generate trust and to attract sponsors and partners.

This proposal aims to define a framework for collecting, analyzing, and reporting relevant Ecosystem metrics.

Proposal

  • GitHub projects using JSON Schema topics.

    • Total number of projects.
    • Total number of stars.
    • Total number of contributors.
    • Total number of forks.
    • Total number of dependant projects.
  • Github projects using any of the implementations listed on the implementers page.

    • Total number of projects.
    • Total number of stars.
    • Total number of contributors.
    • Total number of forks.
    • Total number of dependant projects.
  • Data of the Top 5 projects by language.

  • We will add to this the adopters that are self-reporting in our adopters file.

  • It would be great to be able to group the results by programming language.

Implementation Plan

We plan to implement these metrics by:

  • Developing a set of data collection tools and scripts using the GitHub API.
  • Regularly reporting the metrics to the community through a dedicated dashboard or report.

Some ideas of similar projects:

We invite the community to participate in the discussion and contribute to this effort. Your feedback and collaboration are essential to the success of this initiative.

Collaboration and Volunteers

If you're interested in collaborating on this project or have skills in data analysis, tool development, or project management, please express your interest in the comments. We welcome all forms of collaboration and support.

This issue serves as a starting point for discussion and collaboration. Let's work together to define and implement ecosystem metrics for JSON Schema that will benefit the entire open-source community.

In case it helps, the hack for number of contributors is to set per_page to 1 and then look at the last page.

E.g. here with the CLI for the first result in the topic, with some silly Link header parsing:

⊙  gh api 'https://api.github.com/repos/tiangolo/fastapi/contributors?per_page=1' --include -X HEAD | rg -o '<.*page=(\d+)>; rel="last"' -r '$1'
464

Obviously some other stuff is available in the CLI too.

⊙  gh repo view json-schema-org/json-schema-spec --json forkCount,stargazerCount,watchers
{
  "forkCount": 261,
  "stargazerCount": 2959,
  "watchers": {
    "totalCount": 101
  }
}

I'll look to see about the rest of the data, a bunch more should be fairly easily retrievable.

With all the info provided by Julian the only pending element are the dependant projects but that will require web-scraping or using another tool like https://pypi.org/project/github-dependents-info/ because that is not provided by the api:

This is the information I am looking for but aggregated:
https://github.com/ajv-validator/ajv/network/dependents

It does indeed initially seem like that dependency data isn't available via API... :/

Do we know how "dependents" is calculated? It might be more or less useful depending on how it works. For example, ajv is used by eslint. Would every project that uses eslint be considered a dependent? That would create a lot of noise making it not a very useful metric.

AFAIK how it's calculated is language/tooling specific (e.g. in Python there was a long running issue about supporting pyproject.toml files which affected numbers IIRC). That one was finally fixed a few months ago. So what's in there is best-current-effort essentially on what GitHub supports. Further detail I'm sure is in the GH Docs.

IME it's indeed very noisy and not very useful, I essentially never look at it for my own repos -- but all this obviously depends on what question someone's trying to answer. But it's what's there, and I assumed the hope was maybe we could find some signal in the noise anyhow.

I've created a repo for this work https://github.com/json-schema-org/ecosystem

Do we know how "dependents" is calculated? It might be more or less useful depending on how it works. For example, ajv is used by eslint. Would every project that uses eslint be considered a dependent? That would create a lot of noise making it not a very useful metric.

I think we can look for directly depended on vs being a transitive dependency.
That might actually be pretty interesting and to see if that differes across languages.
Although, I suspect javascript/node.js has more dependancy trees than most other languages.

The first three Issues in the ecosystem repo cover:

  • Repos that use the json-schema topic over time
  • ...and their stars and forks over time
  • ...and their contributions over time

Initially, we will look to get current data and ongoing data on a weekly basis.

The first three Issues in the ecosystem repo cover:

Do you like to get help on those issues?

The first three Issues in the ecosystem repo cover:

Do you like to get help on those issues?

Yes. I think I should commit the work I have done so far on a branch. I probably should not spend much more time on this. I will spend a little more time. I think it is working but I need to run it and wait for it to complete.

I have pushed some code to https://github.com/json-schema-org/ecosystem/tree/main/projects/initial-data
It runs and produces results, but it has limitations that need addressing.
So far, the code only gathers initial data, and not ongoing data via actions.

The first three Issues in the ecosystem repo cover:

Do you like to get help on those issues?

I've added some new Issues and details to existing Issues where required.
See json-schema-org/ecosystem#1
Help now welcome. Feel free to communicate this to anyone hungry to contribute =]

Hey maintainers! I am really interested in this project. It looks cool to me, especially working with APIs and backend stuff. I love to work on this with you. Currently, I am researching and checking out the codebases. I will ping you once I am done with some good work. I am ready to work on these issues: https://github.com/json-schema-org/ecosystem/issues.

Should we close this issue to continue the work in the ecosystem repo?

Hello! 👋

This issue has been automatically marked as stale due to inactivity 😴

It will be closed in 180 days if no further activity occurs. To keep it active, please add a comment with more details.

There can be many reasons why a specific issue has no activity. The most probable cause is a lack of time, not a lack of interest.

Let us figure out together how to push this issue forward. Connect with us through our slack channel : https://json-schema.org/slack

Thank you for your patience ❤️

Let's continue the discussion in the Ecosystem repo. Thank you all!

https://github.com/json-schema-org/ecosystem