Does not or brokenly handle hostnames ending in a dot as common in DNS queries
Closed this issue · 2 comments
I have the following script named just-the-2nd-level-domain.pl
:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Domain::PublicSuffix;
my $suffix = Domain::PublicSuffix->new({ allow_unlisted_tld => 0 });
while (my $domain = <>) {
chomp($domain);
say ($suffix->get_root_domain($domain) or say $suffix->error());
}
But the results are unexpected if I add a trailing dot:
~ → echo www.example.com | bin/just-the-2nd-level-domain.pl
example.com
~ → echo www.example.com. | bin/just-the-2nd-level-domain.pl
www.example.com.
Cause is probably line 218 in lib/Domain/PublicSuffix.pm
:
218: $root_domain =~ s/^.*\.(.*?\.$suffix)$/$1/;
Because $suffix
has been correctly set to com
without trailing dot, that regexp will not match and hence nothing gets replaced. (An assertion that this regexp must replace at least something might help here to catch such issues in the future.)
So far I only looked through the code and haven't checked if this is really the cause. But I might implement the suspected fix and create a pull request with it later.
Note: I am using version 0.19 as packaged in Debian 11 Bullseye (current Debian Stable release as of this writing) with Perl 5.32 (from Debian as well).
Yep, this patch fixes the issue:
--- a/lib/Domain/PublicSuffix.pm
+++ b/lib/Domain/PublicSuffix.pm
@@ -215,7 +215,7 @@
$self->root_domain($suffix);
} else {
my $root_domain = $domain;
- $root_domain =~ s/^.*\.(.*?\.$suffix)$/$1/;
+ $root_domain =~ s/^.*\.(.*?\.$suffix)\.?$/$1/;
$self->root_domain($root_domain);
}
Outcome with the patch:
$ echo www.example.com. | ~/bin/just-the-2nd-level-domain.pl
example.com
Will create a pull-request for that.
Thank you so much for your contribution!