URLs that end on "." aren't detected correctly
cryptosteve2 opened this issue · 3 comments
-
iOS version: iOS 17.6.1 // iPadOS 17.6.1
-
Device: iPhone 13 Pro // iPad Pro 3rd generation
-
Delta Chat version: v1.46.9
-
Expected behavior: detect the complete url as url
-
Actual behavior: only a part is detected as url on iOS // iPadOS (but the complete one on MacOS)
-
Steps to reproduce the problem: try url or see screenshots. URL: https://www.sportschau.de/fussball/championsleague/bayern-und-dortmund-spielen-gegen-barca,champions-league-auslosung-136.html#:~:text=Jeder%20der%2036%20Klubs%20-%20bislang%20waren%20es%2032%20-%20spielt%20viermal%20im%20eigenen%20Stadion%20und%20viermal%20ausw%C3%A4rts.%20Schwere%20Ausw%C3%A4rtsaufgaben%20erwischten%20unter%20anderem%20die%20Bayern%2C%20die%20in%20Barcelona%20auf%20ihren%20Ex-Trainer%20Flick%20treffen%2C%20sowie%20Leipzig%20und%20Dortmund%2C%20die%20nach%20Madrid%20reisen%20m%C3%BCssen.
-
Logs:
Thank you for that bug report, I can reproduce it and had a quick look at it, but haven't come up with a solution yet.
Note to self
MessageLabel.parse
fails for this URL:
// Get all Checking Types of detectors, except for .custom because they contain their own regex
let detectorCheckingTypes = enabledDetectors
.filter { !$0.isCustom }
.reduce(0) { $0 | $1.textCheckingType.rawValue }
if detectorCheckingTypes > 0, let detector = try? NSDataDetector(types: detectorCheckingTypes) {
// doesn't match https://www.sportschau.de/fussball/championsleague/bayern-und-dortmund-spielen-gegen-barca,champions-league-auslosung-136.html#:~:text=Jeder%20der%2036%20Klubs%20-%20bislang%20waren%20es%2032%20-%20spielt%20viermal%20im%20eigenen%20Stadion%20und%20viermal%20ausw%C3%A4rts.%20Schwere%20Ausw%C3%A4rtsaufgaben%20erwischten%20unter%20anderem%20die%20Bayern%2C%20die%20in%20Barcelona%20auf%20ihren%20Ex-Trainer%20Flick%20treffen%2C%20sowie%20Leipzig%20und%20Dortmund%2C%20die%20nach%20Madrid%20reisen%20m%C3%BCssen.
// for whatever reason.
let detectorMatches = detector.matches(in: text.string, options: [], range: range)
if detectorMatches.isEmpty == false {
debugPrint(detectorMatches)
}
matches.append(contentsOf: detectorMatches)
}
- It's quite likely a bug in
NSDataDetector
(see above and below, also Mail is affected.), we don't do anything special here. - The problems seems to be the dot at the end:
- Also macOS seems to be affected and ignores the dot at the end (see screenshot in bug report)
we discussed internally about that issue, and came to the conclusion, that we cannot do reasonably much upon that. in theory, we could try to detect urls on our own, but we would probably open much more issues with that that closing this one. parsing text and detecting URLs is a hard job when it comes to cornercases. it is good to leave that up to apple.
for the concrete issue: the bug is that depending on the final dot, the URL is marked as such only "half".
for the final dot: if that belongs to the URL or not is not really detectable. if you have the text
i like http://foo.bar/#hilight=baz.
the URL may or may not include the final dot. looking at URLs existing in the wild, however, it is reasonable to assume that URLs end less often with a dot than sentences - this is probably what apple assumes here - as well as github, btw, when looking at the initial post. but as said, we would leave that up to apple.