Handle internationalised period delimiters
elliotwutingfeng opened this issue · 1 comments
elliotwutingfeng commented
The current implementation only handles .
as a period delimiter, but not 。
, .
, and 。
Code | Character | Is Handled |
---|---|---|
\u002e | . | ✔️ |
\u3002 | 。 | 🚫 |
\uff0e | . | 🚫 |
\uff61 | 。 | 🚫 |
Ideally, we can handle these period delimiter variants with minimal impact on performance.
Similar to john-kurkowski/tldextract#253
elliotwutingfeng commented