Knagis/CommonMark.NET

Terminology changes

Opened this issue · 17 comments

So the Professor is back on CommonMark, which is great news.

Somewhat less great is his renaming of core terms in both the spec and the C implementation (I understand those modifications are by popular demand -- I guess that's the price one has to pay when seeking community approval for their specification).

The following have changed so far:

  • Header --> Heading
  • Horizontal Rule --> Thematic Break (WTF?)

The following are expected to change:

  • Emphasis -> Weak Emphasis
  • Bullet List -> Unordered List

Should CommonMark.NET reflect those changes? I don't mind renaming everything, but wouldn't changing e.g. HeaderLevel to HeadingLevel break compatibility?

P.S. I noticed cmark has a separate struct for header/ing data. Maybe this is an opportunity to do the same here (while retaining an obsolete HeaderLevel property for the time being).

I think that a second property that uses the same backing field plus obsoleting the previous property.

Heading sounds fine but the thematic break... I agree with your statement there :)

As for the struct for header - on cmark it contains heading level and heading source type (atx or setext) which is currently in the BlockTag enumeration for us. I don't know if that is worth it (a struct should not impact GC - at least I think so) - in that it seemingly does not add any value.

So BlockTag will have something like

[Obsolete]
AtxHeader,

AtxHeading = AtxHeader,

etc. Not very pretty IMHO, but will do.

Re: struct I agree that it adds no value, but that would be consistent with ListData and FencedCodeData (the extensions branch has quite a few of those, including DocumentData, where I moved ReferenceMap). This could actually reduce the memory footprint if stuff like Heading.Level is declared as byte.

I couldn't resist jumping on the opportunity with two renaming suggestions of my own:

  1. Weak emphasis
  2. Unordered list

I updated the top post with these and turned it into a checklist.

In case of the heading the struct would only have a single field - which is why it does not have its own struct yet.

As for the weak emphasis and unordered list - both sounds fine with me but lets hear what people on the forum says.

I guess the struct is a matter of personal taste.

I renamed both headers and rulers in the extensions branch (leaving HeaderLevel untouched for now). I figured changes like these should be made there rather than in master.

BTW having read through the topic, thematic break started to make more sense to me. I don't care that much for the specific nomenclature, but it is reasonable to call it something other than horizontal rule. As for an emphasis alternative, get ready for emphatic stress as a viable option 😱

Unlike with weak emphasis, which might be changed as proposed, changed to something unthinkable, or remain unchanged, I feel that unordered lists is the only correct term for what the spec calls bullet lists today.

With that in mind, I went ahead and did a backwards-compatible rename. The state of lists in the extensions branch is now as follows:

  1. ListType and ListData.ListType are obsolete (ListType.Bullet stands unchanged).
  2. BlockTag.List is obsoleted by BlockTag.UnorderedList and BlockTag.OrderedList.
  3. UnorderedListData and Block.UnorderedListData are added.
  4. ListData.BulletChar is obsoleted by UnorderedListData.BulletCharacter.
  5. OrderedListData and Block.OrderedListData are added.
  6. ListData.Start is obsoleted by OrderedListData.Start.
  7. ListData.Delimiter and ListDelimiter are obsoleted by OrderedListData.DelimiterCharacter.
  8. UnorderedListData and OrderedListData define a few additional properties in support of FancyLists.
  9. A LegacyLists extension provides a full backwards compatibility mode, including Unicode bullets (which are soon to become a FancyLists feature).

Update: API Changes contains an updated version.

I started documenting the API changes here.

As you may see, I ultimately moved HeaderLevel into Heading.Level. I also moved DelimiterCharacter into Emphasis.DelimiterCharacter (but didn't document it since that's not in 0.10.0).

I believe that similarly turning Linkable into a struct and removing the TargetUrl and LiteralContent backing fields wouldn't impact performance.

Have you thought about any mechanisms how extensions like FancyLists would be supported from external assemblies that cannot modify Block and Inline classes?

Extensively 😄

Lists are actually extensibility taken to the extreme. Adding a new list style is as easy as:

    public class FullWidthLowerAlphaLists : CommonMarkExtension
    {
        protected override IEnumerable<IBlockDelimiterHandler> InitializeBlockDelimiterHandlers(CommonMarkSettings settings)
        {
            var parameters = new OrderedListItemParameters('a', 'z', listStyle: "fullwidth-lower-alpha");
            var delimiter = new ListItemDelimiterParameters('.');
            yield return new AlphaListItemHandler(settings, parameters, delimiter);
        }
    }

Numerous extensibility points are provided (a couple more added just today), so developing a custom extension should be a no-brainer.

There is still the problem of autodiscovery, i.e. how to make external extensions available in the command line. I'm considering MEF, but adding an external dependency might be an issue (especially with .NET < 4.0). It looks like that warrants separate packages (e.g. CommonMark.NET.Extensibility and CommonMark.NET.Console).

P.S. MEF2 targets Profile 259.

CommonMark 0.23 has been released.

May I suggest that the same changes be applied before 0.11 (if you're planning on releasing one), i.e.

  1. HeaderLevel obsoleted by HeadingData struct with Level property
  2. DelimiterCharacter replaced by EmphasisData struct with DelimiterCharacter property
  3. Linkable???

Yes, I don't see a reason for not doing this. If you merge the changes needed for 0.11 in master I could push the release to nuget.

I'm currently working on disallowing space between link text and link label.

Test Name:  Example541
Test Outcome:   Failed
Result Message: Assert.AreEqual failed. Expected:<<p><img src="/url" alt="foo" title="title" />
[]</p>>. Actual:<<p><img src="/url" alt="foo" title="title[]" />
[]</p>>.
Result StandardOutput:  
Example 541
Section: 541

CommonMark:

    ![foo]␣
    []

    [foo]:␣/url␣"title"

Expected:

    <p><img␣src="/url"␣alt="foo"␣title="title"␣/>
    []</p>

Actual:

    <p><img␣src="/url"␣alt="foo"␣title="title[]"␣/>
    []</p>

Any ideas?

https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/InlineMethods.cs#L935

I suspect that it will be enough to remove the check for space and newline.

Maybe my branch diverged too much from knagis/master, but that doesn't seem to solve the issue with this specific example.

I will take a look at the new tests in a few hours.

No need, solved in AMDL@da3c7bb.

#71 ready for review.