This was a quick, fun exercise to remind me that I can still write Perl. It fetches the list of TLDs from IANA, does a quick bit of munging, then renders a regex which should match any valid FQDN:

#!/usr/bin/env perl
use strict;
use warnings;

use LWP::Simple;

my $fqdn_regex;

if (my $content = get('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')) {
  $fqdn_regex = '(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:';
  $fqdn_regex .= join('|', grep (!/^(#|xn)/i, (split /\n/, lc($content))));
  $fqdn_regex .= ')';
}

my $regex = $fqdn_regex . '(?:\s|\/|$)';
print "$regex\n";

Several caveats:

Maybe I’ll extend it for completeness and/or rewrite it in Ruby someday. Until then, it’ll always be ~/bin/tld_regex for me.



blog comments powered by Disqus

Published

18 August 2009

Category

hacking