w3c/process

Should the process include something about Testing?

nrooney opened this issue · 25 comments

The process gives guidance on "implementation experience" as being necessary for the process, but not for Testing. Our discussions in the Working Group Effectiveness task force have made me consider that maybe testing is almost as important as implementations for specs. Should we therefore include something about testing in the process doc?

I think that the whole way we monitor implementation, testing, and wide review, needs a re-think in the world we now live in. There is an old mental model that all this tracking is done manually, that it's discussed in smoke-filled rooms and manually documented, and so on. We should be using repo labels, milestones, and so on, much more intelligently and uniformly. (IM is-it-H? O).

I agree that we should think about implementation / testing criteria in the process document now that there have been years of somewhat inconsistent practice to learn from. Requiring tests for each feature as a CR entrance criterion makes sense to me. Likewise having multiple-real world implementations (not just proof of concepts/demos) pass them as a CR exit criteria seems worth considering.

As @dsinger notes, this might be implemented in the tooling and best practices for using GitHub instead of or in addition to the formal process. That's another question ... the process document has evolved to be more "declarative" about what must happen and less "procedural" about how to make it happen. I don't have strong feelings about what goes where so long as the process, the manual procedures/guidance, and the automated tools are in sync.

Thanks for the comments. I need to some research on other bodies requirements on implementations and testing, see if we can get any guidance there. I'd also be interested in the opinions of Marcos, Tobie, Foolip, plh and others. I'll ping them.

This came from @nigelmegitt on issue #129:

Currently WGs can transition a Rec track document to CR without having defined tests. However this means that either test creation never happens and the spec can never exit CR. It also means that there's a missed opportunity to double-check every normative requirement with a test, which is good practice for establishing clearly what the requirement is.

Conversely, requiring tests raises the bar for transition to CR, which people likely don't want.

Regardless, I propose to require the test suite to be complete for transition to CR, as part of the exit criteria, and to recommend a good practice that whenever any feature or normative requirement is introduced or modified, an accompanying test is generated.

One of the issues with testing is that web platform tests
a) are hard to understand, so asking anyone who helps improve the spec. to also write a test can be tricky
b) are browser-specific, and we have many specs that are (sometimes or always) outside browsers

I think a first attempt at a test suite would be good as a CR entry requirement. In my mind, that's saying, "This specification is ready, and here is the evidence we have for this assertion." It could be the tests for a single implementation (which will likely be an incomplete set of tests) and/or missing some of the cross-checks that will happen as the spec gets more scrutiny as a CR.

Checking that the test suite is complete and passes should be a requirement of CR exit. We should expect that test suite development will continue after CR entry, as implementations check themselves against each other and spec improvements continue.

If we require a complete test suite at CR Entry, there is an important side-benefit. Once implementations arrive, it can make the process of going from CR to REC much more automatable.

Requiring tests for entry to CR would indeed make the CR exit criteria clearer, and would make WG's more effective, by encouraging them not to accept new features or substantive changes without a test that demonstrates them.

As @jeffjaffe says, making it smoother to get from CR to Rec would be a big benefit, especially because the current situation where we allow normative references from Recs to CRs is in my mind very far from ideal. Those CRs may have incomplete test suites and no realistic prospect of getting to Rec given the priorities of the WG members (I'm thinking of CSSWG in particular, though there may be others).

The SOTD boilerplate wording about CRs not being suitable for use as normative references is to my mind there for a very good reason, and it does not reflect well on W3C to ignore its own advice here!

So +1 to @nrooney #157 (comment) even though that doesn't reflect current practice for TTWG with TTML, which is using the current process to push out CRs more quickly albeit with a strong intention to move to Rec via a test suite. In a resource-squeezed group (every group?) the burden of adding tests will seem initially unappealing to some members.

I need to some research on other bodies requirements on implementations and testing, see if we can get any guidance there.

@nrooney I've seen varying practices. One of the appealing ones is a precursor to tests, which is to mandate that a description of the requirements is agreed prior to commencing work on the technical solution, and agreed by a differently focused group looking at the user need first.

There's a long thread I initiated (probably at an ineffective moment in the spec development cycle for WebVTT) at w3c/webvtt#384 about algorithmic specifications, that wandered into testing:

I observed that an algorithm plus a set of tests derived solely from that algorithm can be implemented and the implementation can be verified, but has no link back to what anyone actually wanted the spec to do in the first place. It is also hard for some constituents to use, being optimised for implementers over users. One proposal I made is that there should be a set of testable and human readable behaviours defined, for example in the style of BDD - Given [precondition] and [A] happens then [B] is the result.

This gets to a path as follows:

  1. Decide what we want to happen and document it (as e.g. BDDs)
  2. Specify the syntax, semantics, API etc that achieves the thing we want to happen
  3. Write tests for the things specified in 2.
  4. Confirm that the implementations that pass the tests also achieve the results documented at 1.

That's a closed loop that both demonstrates interoperability and achieves a human readable (and human-relevant) goal.

Tests are no good without implementations, of course. And there are fields where the spec is sufficiently complex or in sufficient flux that implementing before CR risks a lot of wasted effort.

In practice at W3C isn't there usually a fair amount of implementation experience before CR? And in practice aren't test results a major driver of changes during CR?

I'd guess that there's a greater risk of wasted effort by not requiring tests before CR (spec churn to make it interoperably implementable) than from test churn to track spec changes.

Yes, it's possible complex specs like this can be handled as an exception. Also, there is a grey line on spec. changes; if the spec. is ambiguous or self-contradictory, it has to be fixed. If it's missing definition of what to do in a specific case, it has to be fixed. Do we hold off fixing if we don't have a test for the newly defined behavior?

In practice at W3C isn't there usually a fair amount of implementation experience before CR?

@michaelchampion It varies. It's unusual for their to be no implementation experience before CR, but there are occasions where there are no (public visible) tests and the implementation experience used to derive the spec is limited to a small number of implementations, e.g. 1.

And in practice aren't test results a major driver of changes during CR?

They sometimes drive spec changes, but they also sometimes drive changes to the tests themselves. [carefully avoids suggesting any groups might game the system]

I'd guess that there's a greater risk of wasted effort by not requiring tests before CR (spec churn to make it interoperably implementable) than from test churn to track spec changes.

+1

Do we hold off fixing if we don't have a test for the newly defined behavior?

@dwsinger There's certainly an argument in favour of doing that. If the intention is to clarify, but the test conditions to demonstrate the case being clarified cannot be written down, I would argue that clarity has not been achieved, and the change should not be made.

I am inclined to leave this case-by-case and not formalized more in the process

Do we hold off fixing if we don't have a test for the newly defined behavior?

@dwsinger Yes! That's exactly the point. Adding a test for the newly defined behaviour is a valuable check-point to verify that everyone has understood it the same way.

I understand the point. But imagine: you're in a working group, and there is a glaring error in the spec. You know the fix, but the person who understands the tests is unable to fix them immediately. Should you continue to have the glaring error, and wait until tests arrive, and have implementers continue to implement the wrong thing; or publish the corrected spec. asap, and catch up with tests asap? I am not sure I want the process to be prescriptive. There may be cases where we need to fix, for example, a security problem as fast as possible.

@dsinger help, my email address is correct in my settings. maybe people are typing at-dsinger and not at-dwsinger? not sure what to do...

You know the fix, but the person who understands the tests is unable to fix them immediately. Should you continue to have the glaring error, and wait until tests arrive, and have implementers continue to implement the wrong thing; or publish the corrected spec. asap, and catch up with tests asap?

@dwsinger I'd hope that everyone involved in the spec understands the tests, and can propose updates to both together. For the sake of argument let's assume that is not the case though: if the goal is to highlight the problem to implementers, then the spec could still be updated with an informative placeholder warning that there's an issue and that this text is known to require a fix, ahead of any fix actually being agreed.

I am not sure I want the process to be prescriptive.

I agree in general - in this case I feel the balance tips in favour of requiring tests, but that's just my view.

I guess I could live with an in-line errata report and "expected change" in the case that tests are not available. In fact, I might even prefer it.

I think to have a decent viewpoint here, we need to revisit why we want to have something about testing in Process at all.

For the PR entry criteria (now they're not CR exit criteria) requires us to "show adequate implementation experience except where an exception is approved by the Director". The definition of "implementation experience" says:

Implementation experience is required to show that a specification is sufficiently clear, complete, and relevant to market needs, to ensure that independent interoperable implementations of each feature of the specification will be realized.

The ultimate goal here, in the Process, is for the W3C to recommend wide deployment of the specification (per the definition of Recommendation).

We've seen vastly different interpretations of the Process requirements historically (some specs going to REC with an implementation report that's little more than "these people say they've used these attributes in their documents", some with an implementation report that's "I checked and these browsers support this feature", and others where there are thousands of tests listed all with at least two passes), and I think if we're going to push there to be more testing we also need to consider what we're willing to have specs go to REC with.

To me, requiring every single edge-case test case to have two passing implementations is pretty ridiculous (because with any complex spec, it's unlikely there will ever be any bug-free implementation) and it results in groups going, "no, we don't want your test for this spec, because it would delay us going to REC" (and yes, that has happened).

We need to decide—if we want to making testing part of the Process—whether we want testing to merely facilitate getting a specification to Proposed Recommendation or whether we have a goal of wider interoperability (quite possibly with more implementations than we had when the spec of PR) longer term (because we should be maintaining the spec, either as a Group or the W3C).

As for when to require tests, I think a lot of this comes down to who do we expect to write tests? If we expect implementers to write them alongside their implementations (which, de facto, is what always happens: it's almost never the case anyone else pays for a large testsuite), then we shouldn't require there to be tests before we expect there to be implementations; while we no longer have a call for implementations, the stated goal of CR is to gather implementation experience (which implies we don't have it before hand), and therefore if we expect implementers to write tests we cannot require tests for CR entry. I suspect if we require tests for CR entry, we'll just end up in a situation where we already have at least one implementation before CR, which AIUI isn't what's meant to happen per Process.

Obviously, we have the notable difference to the WHATWG Working Mode which states that normative changes "should have corresponding test changes, either in the form of new tests or modifications to existing tests" (though note there's the assumption that if there aren't any implementations there will be soon, due to explicit statements from implementers) which is considered a significant bonus because it avoids the historic risk of specs changing and implementers not noticing that one sentence in the middle of it has changed. Certainly I'd strongly argue from the point at which there are any implementations we really should strongly push for there to be tests for every change, because when we go to PR we want there to be evidence of interoperable implementation of that document and not just evidence of interoperable implementation of the first CR (and we've undoubtedly done this previously, with small changes that ultimately nobody implemented).

Implementation experience is required to show that a specification is sufficiently clear, complete, and relevant to market needs, to ensure that independent interoperable implementations of each feature of the specification will be realized.

This is a problematic sentence for me. Too much expectation is being loaded onto implementation experience. The assumption that features will only be implemented if they are needed is not in reality always valid.

The fact that a feature has been implemented is no indication per se that the feature meets market needs or is actually useful. Rather, it is a verification of the fact that the specification is implementable, which is a smaller point but no less important as a key requirement of the specification. Having more than one implementation is a good sign that the spec is also interoperable, which is also important.

To validate against market needs, a better tool would be a less technical, more "business" (in the widest sense of the word) oriented expression of requirements and criteria for those requirements to have been met.

I see that Process 6.4 Candidate Recommendation states that all transitions to CR:

  • must show that the specification has met all Working Group requirements, or explain why the requirements have changed or been deferred,

but in practice Working Groups can and often do take a very lightweight approach to this, for example by stating that there are no defined requirements. Is that a loophole? If so, should we not close it?

I suspect if we require tests for CR entry, we'll just end up in a situation where we already have at least one implementation before CR, which AIUI isn't what's meant to happen per Process.

I have the impression that this is exactly the direction of W3C these days, especially considering at the way that W3C and WHATWG can work together. It certainly is not disrecommended in the Process and almost certainly generates a better specification at the CR entry stage.

So there's a bigger question - if it is not what is meant to happen per Process today, should it be what is meant to happen in a future version of the Process?

I partially agree with Nigel. The lack of implementations can be an indicator that something is irrelevant to the target market (and that no-one can be bothered to implement it). The presence of implementations does not, by itself, indicate relevance. Generally we check relevance also through wide review, and so on.

Also, I don't want to slip into the state where people don't bring proposed specs to the W3C until they have it implemented and deployed -- that restricts our ability to review and revise the spec. ("but it's already deployed! we shouldn't change it without very good reason!") and a first mover advantage.

I don't want to slip into the state where people don't bring proposed specs to the W3C until they have it implemented and deployed

+1 to this, and observing that we do seem to be slipping into this, especially at WG level, because we (Chairs, anyway) are being advised that to add something to a WG Charter there has to be a document to begin with. Why would someone go to the effort of half baking a spec and then wait until it is Chartered before finishing it off? It's not practical or realistic. The result is that we make a spec elsewhere - CG, outside W3C, WHATWG, wherever - then bring it as a fait accompli and put pressure on whatever WG ends up with it to publish it unchanged.