Find your own company
abitrolly opened this issue · 8 comments
Currently both the list of companies and emails is hardcoded. At the very least as instruction how to build the stats for your company or employee can be added.
There is a concern that for other companies the requirements to use corporate email for contributions is not so strict, especially if contributors or maintainers are hired or sponsored by corporations to do some jobs.
I will respond to this in two parts.
- It is a good idea to add instructions for how to modify this file. A little background - to created this list we reviewed email addresses associated with commits to GitHub and then filter out which ones were from companies (commercial organizations), leading to the creation of this list in the SQL file. Since the concept of OSCI was to rank companies (leading on from some earlier such studies which were published), we excluded email addresses from universities, freemail providers, etc. It was a fair bit of effort to research all the email domains and find out which categories to put them in. We also had to combine email domains where a single organization uses multiple. It's possible we missed some companies or subdomains in this exercise, and of course new companies will need to be added sometimes too. So this list needs to be maintained, but yes you are right, it will be necessary to publish the rationale and instructions.
- The other issue is people using non-corporate emails. We researched many ways to identify what organization contributions were coming from, such as the contributors profile and the org of the repo. All of these have pros and cons, and in the end we concluded that - for now - we would use the email domain - even knowing that this will under count the totals. There is no perfect science to these analyses but we felt this still gives valueable results. We would like to return to the idea of improving this algorithm - this task will need a bit of scoping first which I do plan to work on.
There is a pain point in open source projects around contributions made during a signed contract with a company. The burden of proof that project doesn't get corporate code as result of third party contribution is placed on open source projects resulting in various CLA and conditional acceptance of contributions. This really hurts.
What could be improved from the corporate side is to make it clear and public which contributions are sponsored or covered by a existing contract. It can also set some standards to get clarity into contracts that say that any code that a person writes belongs to a company. If it will be the responsibility of the company lawyers to track official person involvement into relationships with the company and maintain it online, then an open source developers will feel less pressure over these legal issues, and OSCI could get fine-grained information which emails were involved with certain projects from which company in a certain period.
@patrickstephens1 - I'm a member of EPAM and have a GitHub account, but not part of the EPAM organization in GitHub. In my profile, I have plenty of MIT licensed stuff that I use in conferences and workshops. So how do I show up in EPAM as a contributor in OSCI based on the current algorithm?
@gitaroktato Hey Oresztesz. The OSCI algorithm uses the email domain of the author of the commit. We have a filter which picks out all the company email domains we have found in our analysis. Anyone (EPAM or other) who wants their contributors to be picked up should set their company email address on their public profile, or alternatively it can be set at the repo level (see here https://help.github.com/en/enterprise/2.19/user/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address). Does that answer your qn?
@abitrolly to an earlier question, the project README is now updated with instructions how to add companies and email domains. Actually we are almost ready to publish an update to this mapping having recently completed another analysis of email domains we see in larger numbers of commits. This will be done in next week or so.
@patrickstephens1 thanks for the heads up! The commit with instructions is 54e8526
Ideally the mappings should be in the repository root in self-describing format. Files with which people interact most often - custom mappings and configuration are better not to be hidden in the depths as to require specialized docs to access them. Anyway, the docs are awesome.
@patrickstephens1 OK, I've changed my e-mail address in the git history. I hope it makes some impact