layershifter/TLDExtract

underscore in hostname

Erwane opened this issue · 5 comments

Hi,

i've found a weird behavior in domain extraction;

$extract = new Extract();
$result = $extract->parse('dkim._domainkey.phea.fr');
print_r($result->toArray());

$result = $extract->parse('dkim.domainkey.phea.fr');
print_r($result->toArray());

result

Array
(
    [subdomain] => dkim._domainkey.phea
    [hostname] => fr
    [suffix] => 
)
Array
(
    [subdomain] => dkim.domainkey
    [hostname] => phea
    [suffix] => fr
)

the problem come from _ character.

this regex fix the problem

    const HOSTNAME_PATTERN = '#^((?!-)[a-z0-9_-]{0,62}[a-z0-9]\.)+[a-z]{2,63}|[xn\-\-a-z0-9]]{6,63}$#';

We ran into a similar issue. Would be great if this could be fixed in a relatively short-term. For now we pinned the package onto v1.2.3 as temporary work-around to mitigate the issue.

@layershifter you need a PR ?

@Erwane the PR with matching tests will be awesome 👍 Feel free to open it

PR #26

@Erwane thanks, released as 1.2.5.