softwareunderground/open_geosciene_code_projects_viz

Place to add visualizations or questions that would be cool to see.

JustinGOSSES opened this issue · 13 comments

Place to add visualizations or questions that would be cool to see.

It would be cool to see a sanky or other network diagram of user contributions from USER => REPO

This might show what repositories are connected by 1 or 2 developer links and which repository(s) are their own islands where you wouldn't expect knowledge from those developers to flow out via contributions.

It would be interesting to see something that communicate what users work on similar things.

For example, there might be some users that have contributed to 20 repos spread across a different org/user accounts. Who are the users that have a wide ranging impact?

Alternatively, there might be groups of 5 users who all collaborated together on 3 repositories.

Can we use contribution metadata to understand the communities within the open source geoscience subsurface space?

Using the dependency page (https://softwareunderground.github.io/open_geosciene_code_projects_viz/explore/dependencies/), I can click and eventually show that Pooch has both dependencies in green & repositories that use it in blue! It would be nice to have a specific visualization for identifying repositories that use one another.

Screen Shot 2021-04-18 at 6 48 03 PM

Yes, this would be interesting - how do we help the critical people, etc?

Github just did a cool thing where they gave a badge to everyone who contributed to a repository used directly or indirectly in the helicopter that just flew on Mars.

https://twitter.com/github/status/1384130507898720262?s=19

Another way to frame questions might be, how do we nudge human behavior in productive ways by showing what code projects are used by what other products?

For example, you want to contribute to someone else's code project, which one gives you the biggest impact in the subsurface space as it's used by the most projects?

Or in a project management sense - how can people help with the easier things so the critical parties can do the hard stuff that pushes things forwards?

Building on your comment @RichardScottOZ , a default issue tag is "good first issue". It might be interesting to be able to present a collected view of those across a group of repositories.

To write it up as a user story,

"As a developer looking to contribute who is not confident they have the ability, or time, to contribute to hard complex issues but would like to still contribute, I would like the ability to be directed to easier and/or smaller issues that would be well geared to my time / skill level across a larger number of repos in an application area like subsurface geosciences"

User story 5:

"As a developer looking at a large number of repositories having to do with subsurface geology, I would like to know which ones are likely to accept my small pull request contribution"

This question might be answered by a combination of facts....

  • Proportion of closed vs. open issues.
  • Average response time to pull requests (not sure we can get this? Maybe we can)
  • Total number of closed issues.
  • Proportion of closed issues in last X number of issues submitted by non-members.
  • Presence of README.md
  • Presence of CONTRIBUTING.md or Github pages docs.

Is there a reason for people to believe that people won't accept their contributions as the default stance?

The problem I was alluding to there was less that people don't accept pull requests on purpose but rather that the repo is either unmonitored or the response might not come for many weeks. People generally don't want to spend time working on a pull request that never gets integrated.

It tends to be more common when code is released through large organizations due to the policies involved. It might occur in the geoscience space when dealing with large companies or governmental organizations.

On github.com/nasa for example there is a higher proportion of code repositories that are open sourced after the project is finished. Open-sourcing can sometimes be a process more akin to publishing than developing in the open. When the process for getting permission for open-sourcing requires lengthy reviews of the code involved, it means the code is more likely to be released only when the project is entirely or mostly done.

Oh, yeah, zombie or static projects as such.

Indeed, there are some Geoscience Australia things like that. Error fixes waiting around for untold months etc.

Members/non-members stats is a good idea.

I remember adding something to a defence type networking tool - they were surprised random other person added something, etc.

There will be lots of cases of 'you will only work on this thing for the next 2 months' boss says - so said other repository/project gets no time, I imagine.