Versioning Proposal: Customer Feedback Requested
marstr opened this issue · 21 comments
We've been thinking a lot about improvements that we could make to improve upon our current package management story. There are three problems in particular that have jumped out to us, which think need to be tackled head on in order to improve our story:
- The version numbers need to actually conform to semver 2.0 in order to integrate best with glide and other tools. The biggest change here is that pre-release status should be indicated by major-version 0, not a tag at the end. In the future, there could be “-rc” or “-beta” tags that mean specifically this is a test version of a patch or new revision.
- At the moment, too many packages are bunched up into a single repository. This makes acquisition convenient, but blows up version numbers and doesn't communicate changes' scope or severity well.
- As pointed out in other threads, a lot of swaggers/packages are stable enough to consider release quality. Our support of those shouldn't be questioned because they happen to be adjacent to new swaggers that are unstable.
- Without intervention, releases will continue to be at a relatively slow monthly cadence with a single version number across all packages.
We want to fix all of those problems ASAP. However, your feedback is important to us. We want to ensure that we don't introduce new worse problems while we're fixing some remittable issues.
Here are the changes that we'd like you all to consider: (Please note, the numbering here doesn't line up with the problems above.)
- Does it make sense to split up this repository into several smaller ones?
If so, there are a couple of options, and none of them really require breaking changes. With all of the proposed changes below git submodules could be used to keep this repository alive and targeting each of the smaller repositories. In this way, there need not be breaking changes.- Storage, management, and each ARM package each get their own repositories. Their paths would be:
GitHub.com/Azure/azure-sdk-for-go-arm-[package name]
GitHub.com/Azure/azure-sdk-for-go-storage
GitHub.com/Azure/azure-sdk-for-go-management - Storage, management, and ARM each get their own repositories. Their paths would be:
GitHub.com/Azure/azure-sdk-for-go-arm
GitHub.com/Azure/azure-sdk-for-go-storage
GitHub.com/Azure/azure-sdk-for-go-management - We attempt to match the Azure portal’s groupings, and have families packaged together. Some repository paths may be:
GitHub.com/Azure/azure-sdk-for-go-storage
GitHub.com/Azure/azure-sdk-for-go-arm-networking
GitHub.com/Azure/azure-sdk-for-go-arm-compute
Etc…
- Storage, management, and each ARM package each get their own repositories. Their paths would be:
- When we split up the repository, we reset the versions as makes sense with each package. For some, like storage, they are clearly in release and should be at least at version v1.0.0. Others, like arm/cognitiveservices are probably still in need of evaluation and should be set at v0.1.0.
A major point to consider, is that with other languages like C#, Ruby, or Java, one can separate versioning and source control from one another fairly easily. Golang’s coupling between source and package management forces our hand a little here, instead of making the only knob we have to turn the number of repositories our is code stored in.
So there it is, what are your thoughts everybody? @kpfaulkner @colemickens @tombuildsstuff
Another thought, storing these repositories in GitHub doesn't allow us to add any depth to the tree of repositories i.e. in all of the proposed package names above, there are precisely three levels: GitHub, Azure, and the repository.
Which is sort of annoying because:
import (
"github.com/Azure/azure-sdk-for-go/arm/network"
)
would have to become:
import (
network "GitHub.com/Azure/azure-sdk-for-go-arm-network"
)
The new pattern may be slightly more awkward to get setup, and IDE tooling may be less helpful for auto-generating the import.
If the above modification to import statements is too annoying, we could look into hosting the auto-generated repositories elsewhere that would allow for more tiered repositories. Obviously, our commitment to Open Source would remain the same, as we'd keep the AutoRest Go Generator and Swagger files where they are now. Things like Storage which are still manually written would have to stay in GitHub.
Thoughts?
cc: @brendandixon
@marstr @colemickens I don't agree with the proposal and believe it will make matters worse, not better.
The root of the problem lies in trying to correlate the SDK version to a specific set of service APIs.
Since those APIs are in varying states and will continually vary from one another, that attempted correlation makes creating a sane version for the SDK very difficult. This binding decision pre-dates our use of Swagger, when we implemented service support by hand. With Swagger we no longer need to maintain such a binding.
There are two concepts important to keep separate: The SDK package as a container of code with support for service APIs (at varying versions) and the service APIs. The service APIs already have a versioning model based on dates (e.g., 2015-07-19) that cannot change. Attempting to force fit those date-based versions into an SDK Semantic Version leads to trouble. Instead, as I've suggested many times, the SDK can and should support simultaneous use of multiple API versions for a single service. There will be a default (aka latest), but the customer may pick and choose any version they desire. In that model, we need only change the SDK version when we add a new service API. The SDK version would say nothing about the state of those APIs, but about speaks only to the quality of the SDK and its core libraries.
This is a more flexible model and less likely to break code. Further, it has been used by other SDKs (e.g., https://github.com/aws/aws-sdk-go) with good success. It also enables mixing-and-matching service API calls even when not all regions support the same set (such as, https://aws.amazon.com/blogs/developer/using-the-aws-sdk-for-gos-regions-and-endpoints-metadata/?utm_source=golangweekly&utm_medium=email).
If we insist on binding SDK version, in whatever form, to API versions, then we will create challenges for which the above proposals are the only solutions. I strongly suggest we re-consider and adopt the approach I describe here.
@brendandixon I agree with the principle of separating the version of the Azure Service that we're targeting from the version number that identifies our Go packages. Basically, I think we agree that the service version that we use is an implementation detail. However, I believe that this is a tangential concern.
The primary concern that we're working to address in the proposal above is that changes to any package impact each other package. This creates a lowest-common denominator situation where untouched packages are labeled as having a breaking change because we don't have the granularity to do otherwise. The proposal is not attempting to solve the separate, albeit related, problem of having too direct of a tie between the version of our API and the service call it will make.
@marstr I guess I disagree. The introduction of a new service API version should not impact those already existing. I've used this strategy in other SDKs to good success. We can chat offline if needed.
I think the vectors for breaking changes are elsewhere. Here are some that come to mind (I have witnessed all of these).
- RPs (inadvertently) make a breaking change to an existing API (e.g. change order of parameters).
- Changes in tooling like AutoRest.
- Intentional breaking changes (e.g. the APIs for storage are horrible so we're redoing them).
I believe the question we mainly want to answer via SDK versioning is "if I'm on version X of the SDK and I pull down version Y will my existing code work as-is or will I need to do any refactoring". Today the answer is "maybe, depending on which packages you use" since we have all parts of the SDK grouped into one repository. We've made an assumption that this answer isn't good enough which is why we're exploring ways how to slice-and-dice the packages so they can version independently; it could be the case that our assumption is wrong and people don't care so much which is what we're trying to find out.
After chatting with @jhendrixMSFT and @devigned, I'd like to also propose adding support handling multiple versions of the Azure APIs targeted by our SDK as mentioned by @brendandixon. The way that we could achieve this would be adding the azure version in the namespace of the package. That would mean that the layout of the each repository (no matter which strategy from above that is employed) would like like this:
[repositoryName]/[AzureAPIVersion]/[packageName]
e.g. for arm-datalake-store the layout might be:
azure-sdk-for-go-arm-datalake-store/
..............................................................................20161101/
...................................................................................................filesystem
...................................................................................................account
This approach is convenient for a couple reasons:
- We can insulate ourselves from changes made at the Azure Service level. An updated version of the service being released would not be a breaking change.
- The above layout is more convenient to reference in your code. References would continue to look like:
instead of aliasing or the incredibly gross:
filesystem.Method(...)
azure-sdk-for-go-arm-datalake-store.Method(...)
Note, that this does not compete with the proposal above. We would still split the repositories by service or service category as appropriate and use semantic versioning to help control changes to the shape of our SDK and how it calls the Azure Services regardless of the version of those services that we're targeting.
@marstr Outstanding. :)
👍
@brendandixon & @colemickens thanks FOr the AWESOME feedback.. Looking forward to hearing from our MVPs :) @tombuildsstuff @kpfaulkner
@kpfaulkner In my view, the date-based API version should apply to all service APIs. Helper methods, for example, an ease-of-use helper class, would be versioned separately, as part of the entire SDK.
@tombuildsstuff any thoughts before we summarize and close down :)
@salameer yes, I'll try and get a reply out later tonight, busy week - sorry!
@salameer 👋🏻
Sorry for the delay responding to this!
To reply to @marstr's comments first:
- Does it make sense to split up this repository into several smaller ones?
I don't think so - I actually think it'd actually make things more painful for both the Go SDK team (in terms of context switching/repo's) and consumers (in terms of the upgrade path) - given the Common/Helper functions would be versioned/shared across multiple versions of the SDK.
- When we split up the repository, we reset the versions as makes sense with each package. For some, like storage, they are clearly in release and should be at least at version v1.0.0. Others, like arm/cognitiveservices are probably still in need of evaluation and should be set at v0.1.0.
If we were to follow this approach - it sounds sensible to me - there's a separate comment regarding the Swagger's below.
To @brendandixon's comments:
@kpfaulkner In my view, the date-based API version should apply to all service APIs. Helper methods, for example, an ease-of-use helper class, would be versioned separately, as part of the entire SDK.
+1 - I'm assuming you're assuming these would be SemVer'd rather than Date Versioned?
The service APIs already have a versioning model based on dates (e.g., 2015-07-19) that cannot change
Whilst true, both the API's and the Swagger's can - either because the API is wrong or because the type is reference incorrectly in the Swagger's.
IMO the best approach for versioning the generated code would be to use both the API Date + the revision number for that particular Swagger - but this isn't currently tracked.
To @jhendrixMSFT's comments:
I think the vectors for breaking changes are elsewhere. Here are some that come to mind (I have witnessed all of these).
- RPs (inadvertently) make a breaking change to an existing API (e.g. change order of parameters).
- Changes in tooling like AutoRest.
- Intentional breaking changes (e.g. the APIs for storage are horrible so we're redoing them).
Whilst #2 and #3 are important - I'd argue #1 needs to be broken down slightly:
- Lack of testing of the Swagger's
- Lack of testing of the Generated SDK
As I see it - almost all of those issues would be caught by some kind of acceptance testing. Are there plans to update the AutoRest Go generator to also generate some acceptance tests? (e.g. actually hitting the API via the SDK?).
Whilst I'm aware that these questions probably belong in the Rest API Specs repo - given the Swagger's are a hard dependency - I think it's worth raising the following questions:
-
Are there plans to add some real acceptance testing for the Swagger's? Node packages exist which will validate the response - to ensure the schema's are matched correctly.
- This isn't a perfect solution - but it'd ensure what the code is generated from is good (FWIW as a consumer I see less issues with the AutoRest generator)
- IMO until this is resolved enabling automatic PR's from Swagger changes will only make this problem worse.
-
Currently we're not tracking/exposing the revision of the Swagger files (instead the patch version of the SDK) - so it's hard to know both which version is the source / what's in the current release.
- On that note: the Swagger's aren't tagged (and there's no changelog) - so is the assumption that the master branch is always stable? IMO it's hard to release RC/Beta versions of the SDK's without this?
-2¢
Couple of comments to @tombuildsstuff
Regarding versioning I was initially thinking dates simply because all the Azure APIs are tied to a date "version" already (ie "2015-02-21"). But the more I think about it the more SemVer is nice but I'd still want to know about the "date version" that is used. Does that make sense?
ie, I know that version "2015-02-21" has a feature I want, but how do I know that Go SDK 1.2.3 matches that?
I can't see how, but I'm being greedy and wanting both.
Secondly, regarding the acceptance testing, you mention some Node packages. Can you specify in more detail? For my day job we use Specflow (ie the cucumber/gherkin set of tools) for acceptance testing and it works out quite well. Quick bit of googling shows GoDog in that same family. Is that along a similar line of what you're thinking, or the Node stuff is pretty separate?
Thanks for the input from everybody. This has been a valuable conversation for our planning and roadmap for the product. I'm going to close out this ticket for now, but spend some time tinkering with our generator and experimenting with the improvements mentioned here.
To summarize quickly:
The biggest take away from this conversation is a pretty simple idea that will mean real changes, there is real value in separating the version of our SDK from the Azure API Version that is targeted. There is mixed support/appetite for separating the repository into several smaller ones in order to convey more detail around breaking changes. Basically we all seem to be in agreement, that at the heart of a lot of frustrating churn is poorly written Swaggers that must be updated. A lot of the changes proposed are simply massaging that. No matter what we do going forward, there are a lot of other improvements in the test and deployment spaces that need attention.
@kpfaulkner sorry, missed your reply:
Regarding versioning I was initially thinking dates simply because all the Azure APIs are tied to a date "version" already (ie "2015-02-21"). But the more I think about it the more SemVer is nice but I'd still want to know about the "date version" that is used. Does that make sense?
Yes it does. However given it's published in the changelog and exposed as a constant in the SDK (in the version.go
for each API) - does it need to be exposed further?
Secondly, regarding the acceptance testing, you mention some Node packages. Can you specify in more detail? For my day job we use Specflow (ie the cucumber/gherkin set of tools) for acceptance testing and it works out quite well. Quick bit of googling shows GoDog in that same family. Is that along a similar line of what you're thinking, or the Node stuff is pretty separate?
In that context I was referencing testing the Swagger, where I was hoping for something Swagger specific - with some quick Googling I found:
Whilst I've not tried either of them, they look like they (or something similar) could be useful in this effort to improve the Swaggers.
With regards to acceptance tests for the SDK - I've been pondering if it's worth generating some acceptance tests in the short-term? I was actually thinking about just calling the Go SDK via raw Go code - I'm not sure I see the benefit in using Cucumber syntax there over just calling the SDK directly (as it's another layer of abstraction) - but I've no preference here :)
@marstr thanks :)
Out of interest are there plans on the Swagger's team roadmap for improving the testing story? (Is it worth opening an issue on that repo with a summary of this too?)
Jeez, just saw this comment @tombuildsstuff; sorry about the ridiculous delay! Anyway, yes: @jhendrixMSFT has been off working hard with some other folks on a story for improving our testing story for the generated SDKs. The issue tracking those efforts as they relate to this repository is #449.
Update: I've added an experimental branch with the SDK re-hashed to support multiple API Versions. You can find that here: https://github.com/Azure/azure-sdk-for-go/tree/experimental/allAPIVersions
Please note, as an experimental branch, breaking changes will come un-announced as I have time to work on it.