=head1 NAME Web::Encoding - Web Encodings APIs =head1 SYNOPSIS use Web::Encoding; $bytes = encode_web_utf8 $chars; $chars = decode_web_utf8 $bytes; =head1 DESCRIPTION The C<Web::Encoding> module provides a set of functions to handle Web-compatible character encodings. Also, there are following modules in the C<perl-web-encodings> repository: =over 4 =item L<Web::Encoding::UnivCharDet> The universalchardet (or universal detector) implementation in Perl, which can be used to implement HTML parsers. =item L<Web::Encoding::Normalization> Implementation of Unicode's string normalization algorithms, i.e. NFC, NFD, NFKC, and NFKD. =item L<Web::Encoding::Preload> Preloading encoding modules and data files. =back =head1 FUNCTIONS Functions described in these subsections are exported by default. =head2 Encoding labels and properties of encodings There are following functions to handle encoding labels and to obtain properties of encodings: =over 4 =item $key = encoding_label_to_name $label Find the encoding identified by the specified label. As does the "get an encoding" steps [ENCODING], this function ignores leading and trailing spaces, and compares labels ASCII case-insensitively. The function returns the encoding key (not a name), if found, or C<undef>. =item $key = fixup_html_meta_encoding_name $key Replace a encoding key for the purpose of HTML character encoding declaration, as in "prescan a byte stream to determine its encoding" and "change the encoding" algorithms [HTML]. The argument must be an encoding key (not a name or label). The function returns an encoding key. =item $key = get_output_encoding_key $key Return the result of applying the steps to "get an output encoding" [ENCODING]. The argument must be an encoding key (not a name or label). The function returns an encoding key. =item $name = encoding_name_to_compat_name $key Replace an encoding key to its official name as used in e.g. C<characterSet> or C<inputEncoding> attributes of the C<Document> interface [ENCODING] [DOM]. The argument must be an encoding key (not a name or label). The function returns an encoding name. =item $boolean = is_ascii_compat_encoding_name $key Return whether the specified encoding is an ASCII-compatible character encoding [ENCODING] or not. The argument must be an encoding key (not a name or label). =item $boolean = is_encoding_label $label Return whether the specified label identifies an encoding [ENCODING] or not. It compares labels ASCII case-insensitively. Unlike the C<encoding_label_to_name> function, however, this function does not ignore spaces. =item $key = locale_default_encoding_name $tag Return the encoding key (not a name or label) of the default character encoding for a locale [HTML]. If no default is known for the specified locale, C<undef> is returned. The argument, which identifies the locale, must be either a BCP 47 language tag or a string C<*>. The language tag must be the primary language tag only, C<zh-TW>, or C<zh-CN>, otherwise no data is available. The tags are ASCII case-insensitive. If C<*> is specified, the global default encoding that can be used when the locale is not known or the locale has no default is returned. =back For the purpose of this module, the B<key> of the encoding is a short string uniquly identifying the encoding. It is a lowercased variant of the encoding name [ENCODING]. Note that the encoding names in the Encoding Standard are not compatible with Perl L<Encode> module's encoding names. =head2 Encoders and decoders There are following functions for encoding and decoding: =over 4 =item $bytes = encode_web_utf8 $chars Encode the character string in UTF-8 and return the encoded bytes. This function can be used to implement the "UTF-8 encode" operation [ENCODING]. =item $chars = decode_web_utf8 $bytes Decode the bytes as UTF-8 and return the decoded character string. Any bad byte is replaced by U+FFFD characters without failure. This function can be used to implement the "UTF-8 decode" operation [ENCODING]. =item $chars = decode_web_utf8_no_bom $bytes Decode the bytes as UTF-8, not recognizing BOM, and returns the decoded character string. Any bad byte is replaced by U+FFFD characters without failure. This function can be used to implement the "UTF-8 decode without BOM" operation [ENCODING]. =item $bytes = encode_web_charset $key, $chars Encode the character string and return the encoded bytes. The first argument must be the key of the encoding used to encode the string. Any character not representable in the encoding is converted to an HTML decimal character reference for the character. This function can be used to implement the "encode" operation with error mode C<html> [ENCODING] [ENCODING16]. =item $chars = decode_web_charset $key, $bytes Decode the bytes and return the decoded character string. The first argument must be the key of the encoding used to decode the bytes. Any bad byte is replaced by U+FFFD characters without failure. This function is equivalent to the following code using L<Web::Encoding::Decoder>: $decoder = Web::Encoding::Decoder->new_from_encoding_key ($key); $decoder->ignore_bom (1); return $decoder->bytes ($bytes) . $decoder->eof; =item [$name, $name, ...] = encoding_names Return the list of the encoding keys (i.e. the lowercase variants of the encoding names), as an array reference. =back In addition to UTF-8, following legacy encodings are supported: IBM866 ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-8-I ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-8859-16 KOI8-R KOI8-U macintosh windows-874 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 GBK Big5 EUC-JP ISO-2022-JP Shift_JIS EUC-KR x-user-defined UTF-16BE UTF-16LE replacement =head1 SPECIFICATIONS =over 4 =item ENCODING Encoding Standard <https://encoding.spec.whatwg.org/>. =item ENCODING16 UTF-16 encoder <https://github.com/whatwg/encoding/commit/8360f775c8df145f649047c7d59c5ff733ade112>. =item HTML HTML Standard <https://html.spec.whatwg.org/>. =item DOM DOM Standard <https://dom.spec.whatwg.org/>. =item ENCVALID Encoding Validation <https://wiki.suikawiki.org/n/Encoding%20Validation>. =back =head1 DEPENDENCY The module requires Perl 5.8 or later. =head1 AUTHOR Wakaba <wakaba@suikawiki.org>. =head1 LICENSE Copyright 2011-2018 Wakaba <wakaba@suikawiki.org>. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut