microsoft/rushstack

License Compliance Question

martinwoodward opened this issue Β· 28 comments

Hey folks. I was reading this thread by @jamiebuilds https://threadreaderapp.com/thread/1002696910266773505.html

And it concerned me as Jamie claims that portions of Rush are derived from https://github.com/lerna/lerna

However, Jamie is out for the weekend (I'm jealous) so he's not able to point me to the code similarities at the moment. But he did say that he'd be happy if the entire project was attributed as a fork of the Lerna.

I've checked with the original copyright owner @kittens along with the current maintainer @evocateur and they didn't know of any issues - though it's fair to say they hadn't really heard of Rush before I pinged them.

I've taken a look at the code myself, however the switch between Typescript and Javascript is throwing me quite a bit. While they are both npm monorepo things I'm not able to find any code reuse yet. If folks can see any - let me know. I asked a couple of buddies to check for me also and they couldn't see any but I'll dig in more next week at work when people come in. Saw this come up as a story on /r/linux and also HN (where I saw it) so if folks spot a similarity that has been discovered in the comments of either of those discussions can you let me know?

Do you happen to know if this project was derived from https://github.com/lerna/lerna - if so I'm happy to send you a PR with correct attribution for them. Alternatively if it was inspired by https://github.com/lerna/lerna but doesn't share any code then would still be good if you gave them a shout out in your ReadMe or the wiki. To be honest, calling out other alternatives for doing an npm monorepo in the wiki would be a cool thing to do anyway as it's always good to give people choices.

Can you get back to me ASAP? I'd like to get this resolved as quickly as possible.

Hey, thanks Martin. I just saw the threads this morning.

I'm not aware of any deliberate copying of the Lerna code, but let me dig into it now and call all the devs on the team to be sure. If there is, we should definitely give correct attribution, so I want to investigate. Do we have any links pointing to similar code between the two that I can take a look at?

Do you think it might be good if I explained some of the differences between Rush and other similar monorepo tools like Lerna, Yarn workspaces, pnpm, etc and why they are different? They are all useful tools but all a bit different, too.

Thanks for getting back to me on a Sunday. Much appreciated. Not been able to find similar code yet. If anyone can - please dump some links into here for me. @pgonzal if you can check with the team that would be awesome.

Some history would be good I think. Why does Rush exist at all, were you inspired by things like Lerna or Yarn workspaces or the others ones? Keen to get this cleared up and make sure we are doing the right thing.

I checked with the devs, and nobody's aware of any code coming from Lerna. If we inadvertently used something without credit, I really would like to know so that I can fix it. Please let me know and I’ll get on it right away.

I checked with the devs, and nobody's aware of any code coming from Lerna.

It's easy to say "no", you know. You should check the code with a diff tool or something. And not just the last version but all commits.

ralph commented

On the thread, @jamiebuilds states that the git history was altered after his complaint. That statement is very hard to comfirm or falsify without the repository before the (alleged) alteration. So if nobody pulled a copy (including git history) in the early days, this isn't going to lead nowhere.

Probably worth a bit of a background in how you tend to spot unattributed code. The diff tool thing often doesn't work that well when looking for code being re-used in another open source project without attribution. Most often when you are doing a forensic code examination, the give-aways tend to be in terms of variable / function name choices, code comments, code structure etc. Diff tools can easily get fooled by some slight changes, even changes to whitespace in the naive use of the tools. You use tools to get you close to areas that might be similar and then often eyeballing is the best way

The issues I see most often tend to be things like people copy/pasting an answer from StackOverflow and not correctly attributing as well as getting the original authors permission to license under MIT or similar terms rather than the StackOverflow license (which is a whole other discussion). Another frequent place where people forget to attribute is when they are referencing a technical book. But occasionally you see code flow between projects with a bit of refactoring also.

I have not checked all commits because they are both busy projects and little usually changes between each commit. Rush has had lots of releases and Lerna over 100 I think with several major restructurings over the way. I first looked at the latest versions and the initial versions and then took some snapshots from 2017 and 2016. This code was internal in Microsoft for a while before it was refactored to go open source but the changes there were about removing the components they were not releasing as open and renaming from an internal codename to the current naming.

I've had some other people who are experienced in doing forensic code comparisons take a look for me and they can't find anything either. I also got an independent outside consultancy firm who specialize in this type of thing to take a look for me. They didn't find anything - but I'm going to get them to do a more in-depth scan which will take a bit longer to come back with results.

I've talked with the original copyright holder and the current maintainer and neither have reported any issues. The original poster has been unable to point me at the issue also - but to be fair, he was busy at the time away from his computer.

So far, I haven't found any evidence of code coming from outside the project without attribution. However, if people have found something please let me know as we would love to find what it is that Jamie feels the Rush project has done wrong and fix it.

The team themselves are very keen to see this resolved as they are proud of the work they do and the effort that went into it.

Hey @pgonzal - would you be able to comment on the history of this project a bit? How did it start, even if it didn't take any actual code from Lerna - was it heavily inspired by it or the other monorepo tools?

Please stop mentioning me on this issue and contacting me about this, I'm done talking about it since Microsoft employees (Scott Hanselman) have gone out of their way to contact my friends and try to convince them to say shit about me publicly. You guys are assholes, now fuck off.

So if nobody pulled a copy (including git history) in the early days, this isn't going to lead nowhere.

This repo does have a few forks that are widely outdated (and diverged in both directions), so unless someone went through the trouble of scrambling their history as well without actually pulling them up to date, they should be useful.

Given that the author making these accusations can't be bothered to (publicly) put some actual proof to their allegations (let alone be civil), I'd personally start ignoring it at this point. If the author went through all the work to compare the repositories to find evidence of this scrambling, I'm sure they checked out the repository, yet are unwilling to make anything public other than a rant on twitter. That's not how you defend your software against alleged theft of intellectual property.

Hey, sorry Jamie. I'd been deliberately not mentioning you since the initial creation - which was to inform you that I had logged an issue on your behalf and hope that you could explain where the problem is with the project so that we can fix it.

Guess you are getting notifications from GitHub as you were mentioned at the beginning? There is an 'unsubscribe' button at the top of the issue in GH if you want to mute

image

Apologies for the intrusion - just trying to get to the bottom of your issue as we take copyright violations very seriously and would want to fix them immediately.

Not sure what to say. I wanted to make this right. I haven't "got out of my way to contact your friends." I DM'ed one person - the original copyright owner - and said:
image
Not to mention LITERALLY:
image
If you feel that is "[going] out of their way to contact my friends and try to convince them to say shit about me publicly" then I apologize. I'm just trying to figure out what's up here. I guess there's not much else I can do.

Hey @pgonzal - would you be able to comment on the history of this project a bit? How did it start, even if it didn't take any actual code from Lerna - was it heavily inspired by it or the other monorepo tools?

A little history:

Rush started as an internal closed-source project back in the dark days before we did things in the open. My group was building the SharePoint Framework (SPFx), which is an SDK for third-party developers, and also the foundation for a number of SharePoint applications in Microsoft. Being a large code base, we eventually realized the need to break it up into lots of small packages. And being a constantly evolving platform that was closely coupled with its consumers, we had constant frustration with the familiar one-repo-per-package model of NPM. In fall of 2015 we started moving everything into one repo, and quickly saw the need to automate the install/link/build steps.

We did not know of any other solutions internally, so I started work on an internal tool called "NPMX". My first official commits happened over Christmas holiday. Most of the focus initially was around Rush's centralized symlinking approach which I think is unique among the other NPM monorepo tools. It ensures that NPM packages cannot accidentally import dependencies that aren't listed in their package.json file. (We call those "phantom dependencies" as they lead to untested version combinations or module resolution errors.) By the spring of 2016 we were up and running with the core features and even parallel builds, and were shifting focus to the scalability features that would characterize Rush. But it was all still internal.

SPFx released its first dev preview in August 2016. Shortly after we stared publishing @microsoft/npmx using the proprietary SPFx license (i.e. it was still closed source). Given that NPMX wasn't directly tied to any commercial product, it bothered me that it wasn't usable by the general community. So we started the process in Microsoft for an open source release, which eventually happened in January 2017. We had migrated a lot of projects between repos by this point, and were accustomed to bringing along the full Git history. When we do that we use git filter-branch to leave out the other projects from the old repo, which in this case were proprietary code (i.e. projects that we were not releasing as open source). We also renamed "NPMX" to "Rush" everywhere because the NPMX name was too similar to the NPM tool itself and might confuse people.

How does Rush compare to Lerna? Why didn't we abandon Rush or merge it with Lerna?

We get this question sometimes. Today, there are several approaches you can use for monorepo management, each with different goals. For example Yarn workspaces and PNPM recursive link both help you to get all your projects symlinked together, but don't get too much into build/publishing orchestration. We primarily use PNPM with Rush because it solves the NPM "doppelgangers" problem but we don't use its "PNPM recursive link" feature. Lerna integrates with multiple package managers and supplements them with bulk commands for managing lots of projects. Lerna has a much larger and more active community and a longer list of basic operations than Rush. To be honest Lerna is also easier to get started with, so I personally recommend it if it meets your needs.

Rush's focus is scalability. For context, my team supports an internal monorepo that now has around 130+ projects and 200+ contributors, many of whom are infrequent participants unfamiliar with our system or indeed each other. If someone introduces a problematic dependency, or if a dev somewhere encounters a weird build error, or if a CI job starts failing, my team gets a support ticket that disrupts our everyday work. This shaped the character of Rush. For example: We need policies and safeguards for a group our size. We publish JSON schemas for every config file (which is extra work but helps catch mistakes), and we also strictly validate every command-line option. Rush still supports NPM but really wants you to use PNPM. Rush's publishing feature expects you to use specific workflows, and there are something like 5 recommended models that we converged on. Incremental builds are a big deal for us also, but this imposes requirements on your toolchain to get that feature. This summer we're hoping to add support for multi-machine sharded builds and multi-phase builds as our repo grows. Big company stuff I know, but important to us.

So rather than merge all this opinionated stuff into one of the other monorepo tools, we continued with Rush and thought about ways to extend its scenarios beyond SharePoint and Office to help the broader community. The hope is that Rush will appeal to others who are scaling up and encountering the same problems we did. We're new to that and still learning. For example, we've been adding features to make onboarding easier. We moved the wiki to a web site. (I drew the art on there myself as you might be able to tell.) We only recently realized we needed "how to compile this repo" instructions. Rush doesn't receive formal product support or marketing from Microsoft. As engineers, we benefit from a ton of open source libraries in our everyday jobs, and so we just wanted to give back and participate in the community. Our day job is still about commercial products. Like the other tools in the web-build-tools repo, Rush filled a gap that we had internally and we think it's cool so we wanted to share it. But there are definitely plenty of other great choices out there if you have different requirements.

Those sound like good closing remarks to wrap up a conversation in which the initiator has shown no further interest of participating.

As a bystander I subscribed to this issue out of curiosity and applaud you guys for handling this topic so professionally.

Whoever can handle a situation in a more professional and civilized manner should be given the benefit of the doubt. How would stealing code even be wrong if Microsoft did it calmly? A wise man once told me, β€œI’ll take a tyrant covered in blood and holding a lit match over a hero covered in gasoline any day.”

Long live Microsoft! Bravo team.

Context: I’m the creator and license owner of Lerna.

Just want to publicly, and on the record, state that I have no intention to believe that Rush has copied any code from Lerna.

Even if there was, the licenses of the two projects are identical (MIT). In the event that the author field of the license has some signifance, I don’t consider it an issue and would gladly release all liability.

Thank you so much @shanselman for being so diligent!

@kittens Amen. I've gone to double check what you've said, and you're 100% right. Actually, I looked around some more and was shocked that most MIT licenses seem to be direct copy-and-paste jobs of one another, with fragments of one line changed. Just goes to show how common the "photocopy" approach is here on Github.

Ok folks. I’m going to close this one down now in the absence of any copyright infringement claim and out of respect for the original poster so that we don’t bombard him with notifications.

I honestly believe that the original poster believes that this project has at least been heavily inspired by Lerna.

However I also believe that the team have given a good account of the project, their work on it and the lineage of the code that is consistent with all the other data I can find. I also applaud their calmness despite their professionalism being called into question in the comments in some of the reddit and hackernews threads linking to this topic.

However, I am not able to find any evidence of copyright infringement and no claim from any copyright holder therefore I am going close.

I would appeal that folks lay off the original poster a bit. While his communication style may be a bit spicey at times, he has contributed a heck of a lot to the open source community and I applaud anyone who goes to that effort. He was also very quick to reply to me when I contacted him directly despite being away for the weekend and having no clue who I was. My DM’s are always open if he wants to get back in touch.

If folks have a concern about any of the projects in this org then please contact the project first by raising an issue. If you get no joy there then please email opensource@microsoft.com where there is a team of people ready to help. Also take a look at the Microsoft Open Source Code of Conduct.

The job of the Open Source Program Office is to ensure that Microsoft do open source right. We are a big company with lots of people of different experience levels so when someone contacts us we tend to assume they are correct and the team is wrong and we help make sure it gets fixed. More so, we tend to employ the type of people that enjoy being able to find evidence of mistakes and who are able to educate the teams on how to go about things in the right way in the future.

Also feel free to DM me personally on Twitter if you want to chat about any concerns.

Thanks for your time.

actually i gotta lay off here a little bit, because clearly microsoft has made advances in their loser-generating algorithms here wow look:

For the record, if you get mail from Github, it was me reporting you for abuse. Have a good day!

@Airblader Yes I was wondering why Microsoft was sending me yet more spam.

Lets not forget that Microsoft is known for taking inspiration for what happens in the computing community and making their own version.

C# is inspired from what was done with Java.
Parts of .NET Core seem inspired from the node.js event loop and use the same backend library libuv.

There is nothing wrong with this, and many companies start with inspiration and evolve things into something better.

@seand88 Microsoft didn't do anything wrong here. And even if they did, there is nothing wrong with that either. What's wrong is saying they did. And what's right is 7.5 BILLION DOLLARS TO BUY ALL OF COMPUTERS AND ONLINE.

Can someone please lock this thread? @martinwoodward? I've reported this user both to Github and Gitlab now. I don't think we can expect anything sane from this point onwards.

@Airblader I agree. they. bought. the. website.

myty commented

@OKNoah You're kinda coming off as a jerk. Just like you, @martinwoodward is a person...a human being. Labeling someone as a loser isn't going to bring yourself any respect or change anyone's opinion. It only makes other humans tune you out.

Someone get these folks a link to net-nanny.

@Airblader [in your stupid sounding voice] hello operator please connect me to git please i wanna ban an open source contributo--- yes i did pay 7.5 billion dollars yes please connect me at once