github/cmark-gfm

"www." is recognized as an extended www autolink

Opened this issue · 0 comments

www. is recognized as a link.

image

This is surprising given the definition of extended www autolinks:

An extended www autolink will be recognized when the text www. is found followed by a valid domain. A valid domain consists of segments of alphanumeric characters, underscores (_) and hyphens (-) separated by periods (.). There must be at least one period, and no underscores may be present in the last two segments of the domain.

Based on that definition, the parser appears to recognize the empty string ("") as a valid domain. There may be 2 issues that need to be clarified here:

  1. Is "" a segment "of alphanumeric characters, underscores (_) and hyphens (-) separated by periods (.)"?
  2. The third sentence ("There must be at least one period, and no underscores may be present in the last two segments of the domain.") in the quotation above may also be ambiguous. I have 2 possible readings:
    1. "There must be at least one period in the domain. No underscores may be present in the last two segments of the domain.": This would disqualify "" from being a valid domain as "" does not contain a period.
    2. "There must be at least one period in the last two segments of the domain. No underscores may be present in the last two segments of the domain": This reading does not make sense as segments are separated by periods and should not contain them.