deltachat/deltachat-ios

URLs that end on "." aren't detected correctly

cryptosteve2 opened this issue · 3 comments

Thank you for that bug report, I can reproduce it and had a quick look at it, but haven't come up with a solution yet.


Note to self

// Get all Checking Types of detectors, except for .custom because they contain their own regex
let detectorCheckingTypes = enabledDetectors
    .filter { !$0.isCustom }
    .reduce(0) { $0 | $1.textCheckingType.rawValue }
if detectorCheckingTypes > 0, let detector = try? NSDataDetector(types: detectorCheckingTypes) {
    // doesn't match https://www.sportschau.de/fussball/championsleague/bayern-und-dortmund-spielen-gegen-barca,champions-league-auslosung-136.html#:~:text=Jeder%20der%2036%20Klubs%20-%20bislang%20waren%20es%2032%20-%20spielt%20viermal%20im%20eigenen%20Stadion%20und%20viermal%20ausw%C3%A4rts.%20Schwere%20Ausw%C3%A4rtsaufgaben%20erwischten%20unter%20anderem%20die%20Bayern%2C%20die%20in%20Barcelona%20auf%20ihren%20Ex-Trainer%20Flick%20treffen%2C%20sowie%20Leipzig%20und%20Dortmund%2C%20die%20nach%20Madrid%20reisen%20m%C3%BCssen.
    // for whatever reason.
    let detectorMatches = detector.matches(in: text.string, options: [], range: range)
    if detectorMatches.isEmpty == false {
        debugPrint(detectorMatches)
    }
    matches.append(contentsOf: detectorMatches)
}
  • It's quite likely a bug in NSDataDetector (see above and below, also Mail is affected.), we don't do anything special here.
  • The problems seems to be the dot at the end:
Screenshot

IMG_D27D01138222-1

  • Also macOS seems to be affected and ignores the dot at the end (see screenshot in bug report)

Btw, maybe it's not deltaChat related? I have the same issue in MacOS Apple Mail ...
20240903@134719

r10s commented

we discussed internally about that issue, and came to the conclusion, that we cannot do reasonably much upon that. in theory, we could try to detect urls on our own, but we would probably open much more issues with that that closing this one. parsing text and detecting URLs is a hard job when it comes to cornercases. it is good to leave that up to apple.

for the concrete issue: the bug is that depending on the final dot, the URL is marked as such only "half".

for the final dot: if that belongs to the URL or not is not really detectable. if you have the text
i like http://foo.bar/#hilight=baz.
the URL may or may not include the final dot. looking at URLs existing in the wild, however, it is reasonable to assume that URLs end less often with a dot than sentences - this is probably what apple assumes here - as well as github, btw, when looking at the initial post. but as said, we would leave that up to apple.