/furuike

A static file Web server

Primary LanguagePerl

=head1 NAME

Furuike - A static file Web server

=head1 DESCRIPTION

Furuike is a Web server, which can serve static files on the local
file system.

=head1 URL-TO-FILE MAPPING

The path component of the effective request URL is mapped to a file or
a directory, using the rules described in this section (or processed
according to the C<Redirect> directives, as described in the later
section).  The scheme, authority, and query components of the
effective request URL are ignored.

The server script must be executed with the C<FURUIKE_DOCUMENT_ROOT>
environment variable, whose value is the path to the "document root"
directory in the local file system.  It should be an absolute path.
Anything under the document root directory is considered as part of
the Web server.  No confidental data should be put within that
directory.

Any non-final percent-decoded path segment must match to the regular
expression C<\A~?[A-Za-z0-9_-][A-Za-z0-9_.-]*\z>.  The final
percent-decoded path segment must match to the regular expression or
the empty string.  If there is an invalid path segment in the
effective request URL, a 404 or 400 response is returned.

The path component is interpreted as the path to the file or directory
in the document root.  Any non-final percent-decoded path segment must
identify a directory.  The final path percent-decoded segment must
identify a file or directory.  If the final percent-decoded path
segment is the empty string, it must identify a directory.  Some file
name extensions can be omitted, as described later.  No symbolic link
is followed.  If the specified file or directory is not found or not
of intended type, a 404 response is returned.

If the final percent-decoded path segment is the empty string and
identifies a directory (e.g. C</path/to/directory/>), the content of a
directory index file, if any, or the directory index view for the
directory.

A directory index file is a file that would be selected if the last
percent-decoded path segment were C<index>
(e.g. C</path/to/directory/index>).  As the file name extensions can
be omitted, the file name might be C<index.ja.html>, C<index.svg.gz>,
or C<index.txt>.  The C<DirectoryIndex> directive can be used to
specify another last percent-decoded path segment in use.

The directory index view for the directory is a simple HTML document
representing the list of the files and the directories in the
directory.  The document can be customized by C<IndexStyleSheet>,
C<IndexOptions>, and C<ReadmeName> directives, as well as the
C<LICENSE> file (see C<ReadmeName>'s description).

If the final percent-decoded path segment is equal to C<LIST>, the
directory index view for the directory identified by the second-last
segment is returned.

If the final percent-decoded path segment is not the empty string and
identifies a directory (e.g. C</path/to/directory>), a 301 response
redirecting to the path with a trailing C</> character
(e.g. C</path/to/directory/>) is returned.

If the final percent-decoded path segment is not the empty string, not
equal to C<LIST>, and identifies a file (e.g. C</path/to/file>), the
content of the file is returned.

If the file name consists of the following subcomponents in order, all
or some of the extensions with the preceding C<.> character can be
omitted from the path:

  A base name
  Optionally, "." followed by a language extension
  Optionally, "." followed by a MIME type extension
  Optionally, "." followed by a encoding label extension
  Optionally, "." followed by a content-coding extension

If there are multiple candidate files, a file is chosen by following
factors (where MIME type is the most important factor):

  MIME type
  language
  character encoding

If there are still multiple files with same priority, how a file is
selected is unknown.

For backward compatibility, if the path ends by C<,imglist> or
C<,imglist-detail>, a redirect to the C<LIST> in the same directory is
returned.

=head1 MIME TYPES

The MIME type of a file response (i.e. the C<Content-Type:> header
value) is determined by the MIME type extension in the file name of
the response (not the last percent-decoded path segment itself, in
case some file name extensions are omitted in the URL), if any.  There
are following built-in extension-to-MIME-type mapping rules:

  html  text/html
  txt   text/plain
  css   text/css
  js    text/javascript
  json  application/json
  png   image/png
  jpeg  image/jpeg
  gif   image/gif
  ico   image/vnd.microsoft.icon
  xml   text/xml
  svg   image/svg+xml
  xhtml application/xhtml+xml
  pdf   application/pdf
  zip   application/zip

The set of mapping rules can be configured by the C<AddType>
directive.  If no applicable mapping is found, no C<Content-Type:>
header is set.

As a special rule, if the base name is equal to the name specified in
the C<ReadmeName> directive or the base name is C<LICENSE> and there
is no other applicable rule, the MIME type is set to C<text/plain>.

If the MIME type extension is omitted from the URL and there are
multiple candidate files, a file is selected using the following
priority list (C<text/html> is the most preferred format):

  text/html
  text/plain
  image/png
  image/jpeg
  image/vnd.microsoft.icon
  image/gif
  application/pdf
  Any other type (or no type)

For following MIME types, C<charset=utf-8> parameter is set:

  application/json */*+json

For following MIME types, the encoding label determined by the
encoding label extension in the file name of the response, if any, or
the default encoding label, is used as the C<charset> parameter value:

  text/html text/css text/javascript text/xml application/xml */*+xml
  text/xml-external-parsed-entity
  application/xml-external-parsed-entity application/xml-dtd
  text/x-component text/perl text/x-h2h

There are following built-in extension-to-encoding-label mapping
rules:

  u8   utf-8
  jis  iso-2022-jp
  euc  euc-jp
  sjis shift_jis

The set of mapping rules can be configured by the C<AddCharset>
directive.

The default for the default encoding label is C<utf-8>.  This default
can be configured by the C<AddDefaultCharset> directive.

As an exception, if the base name of the file is equal to the name
specified in the C<ReadmeName> directive, the encoding label specified
by the C<charset> option of the C<IndexOptions> directive, if any, is
used as the default encoding label.

If the MIME type extension is omitted from the URL and there are
multiple candidate files, utf-8 is preferred to other encodings.

=head1 LANGUAGES

The C<Content-Language:> header value is determined by the language
extension in the file name of the response (not the last
percent-decoded path segment itself, in case some file name extensions
are omitted in the URL), if any.  The language file name extension is
two ASCII lowercase letters, optionally followed by C<-> followed by
two ASCII letters (i.e. C<[a-z]{2}(?:-[a-zA-Z]{2}|)>).  (More complex
language tags are not supported due to lack of use cases.  Please let
me know if this is not enough.)

If no language extension is found, no C<Content-Language:> header is
set.

If the language extension is omitted from the URL and there are
multiple candidate files, a file is selected based on the
C<Accept-Language:> header in the HTTP request, if any (i.e. this is a
"content negotiation" feature).

=head1 CONTENT-CODINGS

The C<Content-Encoding:> header value is determined by the
content-coding extension in the file name of the response (not the
last percent-decoded path segment itself, in case some file name
extensions are omitted in the URL), if any.  There is following
built-in extension-to-content-coding mapping rule:

  gz    gzip

The set of mapping rules can be configured by the C<AddEncoding>
directive.  If no applicable content-coding mapping is found, no
C<Content-Encoding:> header is set.

=head1 DIRECTORY CONFIGURATION FILE (.htaccess)

The C<.htaccess> file can be used to describe various response options
applied to a directory (and its contents).  Its format is a very
limited subset of the C<.htaccess> files used in Apache HTTP Server.

When a file is used to construct a response, the C<.htaccess> file in
the all directory to which the file belongs directly or indirectly,
traversed upward until the document root directory, inclusive, if any.
Any C<.htaccess> file for the parent directory of the document root,
for example, is not used.  When a directory list is returned as a
response, the C<.htaccess> file in the directory, as well as ancestor
directories until the document root directory, are used. if any.

The C<.htaccess> files are read and processed to check the response
options in the order of the directory tree hierarchy, starting from
the document root.

For example, for the file C</path/to/root/foo/bar/index.html>, the
following C<.htaccess> files are examined in this order, if the
document root is C</path/to/root>:

  /path/to/root/.htaccess
  /path/to/root/foo/.htaccess
  /path/to/root/foo/bar/.htaccess

In an C<.htaccess> file, any empty line or line starting with C<#>,
optionally preceded by space characters, is ignored.

In an C<.htaccess> file, the C<< <IfModule fileName> >> line, where
I<fileName> is some opaque string, indicates that any following line
should be ignored until C<< </IfModule> >> line is found B<unless>
I<fileName> is C<Furuike> or C<mod_headers.c>.

In an C<.htaccess> file, the directives between the C<< <Files
expression> >> line and the C<< </Files> >> line are only applied to
the response using the file whose name matches to the expression.
Only the C<Header> directive and C<IfModule> lines can be specified
between the lines.  The expression can be one of followings:

  ...              File name is equal to |...|.
  "..."            File name is equal to |...|.
  ~ "..."          File name contains |...|.
  ~ "...|...|..."  File name contains one of |...|s.
  ~ "^(...|...)"   File name starts with one of |...|s.

... where C<...> is a string of one or more ASCII alphanumerics or
C<->.

In an C<.htaccess> file, a directive name can be followed by one or
more arguments, as described in later subsection.

Any other line is considered as a parse error.

The C<.htaccess> file is processed in order.  If there are conflicting
directives, the later directive takes precedence.

If one of an applicable C<.htaccess> file has a parse error, any
request to the files in the directory or the directory itself results
in a C<500> response.

=head2 Directives

Following directives are available:

=over 4

=item AddType MIMETYPE .EXT1 .EXT2 ...

Add a set of MIME type extension file name mapping rules.  The first
argument must be an MIME type without parameter (ASCII
case-insensitive).  The other arguments must be file name extensions
(case-sensitive), optionally preceded by a C<.> character.  There must
be two or more arguments.

Note that the following MIME types are replaced by more canonical
ones:

  application/x-javascript  -> text/javascript
  application/x-ms-jscript  -> text/javascript
  application/x-perl        -> text/perl
  text/pod                  -> text/perl

=item AddEncoding CODING .EXT1 .EXT2 ...

Add a set of content-coding file name extension mapping rules.  The
first argument must be a content-coding (ASCII case-insensitive).  The
other arguments must be file name extensions (case-sensitive),
optionally preceded by a C<.> character.  There must be two or more
arguments.

=item AddLanguage LANG .EXT1 .EXT2 ...

This directive is ignored.  If specified, there must be two or more
arguments.  The first argument must be equal to the other arguments,
when any preceding C<.> character in the other arguments is ignored.

This directive is DEPRECATED.

=item AddCharset ENCODING .EXT1 .EXT2 ...

Add a set of encoding label file name extension mapping rules.  The
first argument must be an encoding label (ASCII case-insensitive).
The other arguments must be file name extensions (case-sensitive),
optionally preceded by a C<.> character.  There must be two or more
arguments.

This directive is DEPRECATED.  You should always use UTF-8.

=item AddDefaultCharset ENCODING

Set the default encoding label, which is used as the default value for
the C<charset> parameter.

This directive is DEPRECATED.  You should always use UTF-8.

=item DirectoryIndex SEGMENT1 SEGMENT2 ...

Specify the last percent-decoded path segments for the directory
index.  There can be one or more arguments, which must be valid
non-empty percent-decoded path segments, as described in earlier
section.  The order of the arguments is interpreted as preference.

For example,

  DirectoryIndex top index

... will result in returning C<index.ja.html> as the directory index
if there is only C<index.ja.html>, or C<top.en.txt> if there is also
C<top.en.txt>.

The default is C<index> such that e.g. C<index.html> is used.

=item IndexStyleSheet "URL"

Specify the CSS style sheet URL for the directory index view.  The
argument must be a C<">-quoted URL.  By default, no style sheet is
specified.

=item IndexOptions OPTION1 OPTION2 ...

Specify options for the directory index view.  Options can be
specified as zero or more arguments for the directive.

Options must be option name optionally preceded by C<+> or C<-> sign.
If the option name is not preceded by a C<-> sign, the option value
can be specified by separating a C<=> character.  An option value is a
string of zero or more non-space characters.

The option specifications are interpreted in order.  If the option
name is not preceded by any sign, it clears any current options and
sets an option.  If the option name is preceded by a C<+> sign, it
sets an option.  If the option name is preceded by a C<-> sign, it
removes an option.  If the option value is specified, it is set to the
option.  Otherwise, the empty string is set to the option.

The option name can be one of following values:

  NameWidth DescriptionWidth TrackModified HTMLTable IconsAreLinks
  FancyIndexing charset

The C<charset> option specifies the default encoding label for the
directory readme file (see C<ReadmeName> and the section on MIME
types).  The value must be a valid encoding label.  This option is
DEPRECATED.  Use UTF-8 everywhere.

The other options have no effect.  Use of them are DEPRECATED.

=item AddDescription "STRING" SEGMENT

Associate a short description string for a file name, used in the
directory index view.  The first argument must be a C<">-quoted
description string.  The second argument must be a non-empty valid
percent-decoded path segment, which is compared to the entire file
name or the base name of the file.

=item ReadmeName SEGMENT

Specify the readme file name for the directory.  The argument must be
a valid non-empty percent-decoded path segment as described in earlier
section.

The default value of this directive is C<README>.

If there is a file whose name is exactly equal to the specified
segment, it is used as the directory readme file.  Otherwise, if there
are files whose base name is equal to the specified segment, whose
MIME type is either C<text/plain> or C<text/html>, and has no
content-coding extension, one of such files is chosen by normal file
selection rule and used as the directory readme file.

Note that there can be the directory license file, whose file name or
base name is C<LICENSE>, in the directory, though the base name cannot
be configured by any directive and the MIME type of the file is
limited to C<text/plain>.

If there is the directory readme file or the directory license file,
they are included at the end of the directory index view.

=item HeaderName SEGMENT

Specify the header file name for the directory.  The argument must be
a valid non-empty percent-decoded path segment as described in earlier
section.

The default value of this directive is C<HEADER>.

If there is a file whose name is exactly equal to the specified
segment, it is used as the directory header file.  Otherwise, if there
are files whose base name is equal to the specified segment, whose
MIME type is C<text/html>, and has no content-coding extension, one of
such files is chosen by normal file selection rule and used as the
directory header file.

If there is the directory header file, it is used as the header part
of the C<body> element of the directory index view.

=item ErrorDocument 404 /PATH/SEGMENTS

Specify the file used as the C<404> response.  The first argument must
be C<404>.  The second argument must be a sequence of one or more
non-empty percent-decoded path segments prefixed by C</>.  It is
interpreted as the path to the file from the document root.

If there is a file whose name is exactly equal to the specified path,
it is used as response file.  Otherwise, if there are files whose base
name is equal to the specified path, whose MIME type is either
C<text/plain> or C<text/html>, and has no content-coding extension,
one of such files is chosen by normal file selection rule and used as
the directory readme file.

Note that only the built-in rules and the C<.htaccess> file in the
document root directory, if any, are applied to the error document
file.

=item Redirect STATUS /PATH/SEGMENTS URL

=item RedirectMatch STATUS /PATH/SEGMENTS$ URL

Specify that the path should return a redirect to another URL.

The first argument is the status code.  It must be one of redirect
status codes or their synonyms.  It can be omitted if it is C<302>.

  Redirect status code   Synonym
  ---------------------  ----------
  301                    permanent
  302                    temp
  303                    seeother
  307
  308

The second argument must be a sequence of one or more non-empty
percent-decoded path segments prefixed by C</> and not followed by
C</>.

The third argument must be a URL, absolute or relative.  It is used as
the C<Location:> header value.

If there are conflicting file or directory and C<Redirect> directive,
the C<Redirect> directive takes precedence.

=item Redirect STATUS /PATH/SEGMENTS/ URL

Specify that any path in a directory should return a redirect to a URL
in another set of URLs.  This is equivalent to the following infinite
set of directives:

  Redirect STATUS /PATH/SEGMENTS URL
  Redirect STATUS /PATH/SEGMENTS/ URL
  Redirect STATUS /PATH/SEGMENTS/PATH1 URL/PATH1
  Redirect STATUS /PATH/SEGMENTS/PATH2 URL/PATH2
  Redirect STATUS /PATH/SEGMENTS/PATH2/ URL/PATH2/
  Redirect STATUS /PATH/SEGMENTS/PATH2/PATH3 URL/PATH2/PATH3
  :

For example,

  Redirect 301 /foo/ http://example/

... will make paths such as C</foo>, C</foo/>, C</foo/bar>, and
C</foo/bar/baz> redirects to C<http://example/>, C<http://example/>,
C<http://example/bar>, and C<http://example/bar/baz>, respectively.

=item RedirectMatch STATUS /PATH/SEGMENTS/.* URL

Specify that any path in a directory should return a redirect to a
specific URL.  This is equivalent to the following infinite set of
directives:

  Redirect STATUS /PATH/SEGMENTS URL
  Redirect STATUS /PATH/SEGMENTS/ URL
  Redirect STATUS /PATH/SEGMENTS/PATH1 URL
  Redirect STATUS /PATH/SEGMENTS/PATH2 URL
  Redirect STATUS /PATH/SEGMENTS/PATH2/ URL
  Redirect STATUS /PATH/SEGMENTS/PATH2/PATH3 URL
  :

For example,

  RedirectMatch 301 /foo/.* http://example/

... will make paths such as C</foo>, C</foo/>, C</foo/bar>, and
C</foo/bar/baz> redirects to C<http://example/>.

=item Redirect STATUS /PATH/SEGMENTS

Specify that the path should return an error response.

The first argument is the status code.  It must be either a C<4xx>, a
C<5xx>, or C<gone> (= C<410>).

The second argument must be a sequence of one or more non-empty
percent-decoded path segments prefixed by C</>, not followed by C</>.

If there are conflicting file or directory and C<Redirect> directive,
the C<Redirect> directive takes precedence.

=item Redirect STATUS /PATH/SEGMENTS/

Specify that any path in a directory should return a redirect to
another URL.  This is equivalent to the following infinite set of
directives:

  Redirect STATUS /PATH/SEGMENTS
  Redirect STATUS /PATH/SEGMENTS/
  Redirect STATUS /PATH/SEGMENTS/PATH1
  Redirect STATUS /PATH/SEGMENTS/PATH2
  Redirect STATUS /PATH/SEGMENTS/PATH2/
  Redirect STATUS /PATH/SEGMENTS/PATH2/PATH3
  :

=item FuruikeRedirectTop SCHEME://AUTHORITY/

Register a redirect URL replacement rule.  If the prefix of the third
argument (i.e. URL) of the C<Redirect> directive (after this
directive) is equal to the argument of this directive, it is replaced
with a C</> character.

For example,

  FuruikeRedirectTop http://olddomain.example/
  Redirect 302 /oldurl http://olddomain.example/newurl

... is interpreted as C<Redirect 302 /oldurl /newurl>.

=item Header add HEADERNAME "HEADERVALUE"

Specify that an HTTP header should be added to successful responses.
The first argument must be C<add>.  The second argument must be an
HTTP header name consist of ASCII alphanumerics and C<->.  The third
argument must be a sequence of zero or more printable ASCII
characters, quoted by C<">.  Any C<"> or C<\> character in the value
must be preceded by a C<\> character.

=item Options OPTION1 OPTION2 ...

Same syntax with the C<IndexOptions> directive but has different set
of allowed option names: C<ExecCGI>, C<MultiViews>, and C<Indexes>.
This directive has no effect.  This directive is DEPRECATED.

=item AddHandler cgi-script .EXT1 .EXT2 ...

=item RemoveHandler .EXT1 .EXT2 ...

=item IndexIgnore NAME1 NAME2 ...

=item AddIcon (TEXT,URL) NAME1 NAME2 ...

These directives have no effect.  They are DEPRECATED.

=back

=head1 HTML FOOTER

If a file name is specified as the environment variable
C<FURUIKE_HTML_FOOTER_FILE> when the server is started, the content of
the file is used as the I<HTML footer> of the server.

If there is the HTML footer of the server, it is inserted at the end
of the C<body> element when the server returns a C<text/html> document
(from a file, or as a directory index view).

=head1 DEPENDENCY

Perl 5.14 or later is required.

This is a PSGI application.  It requires a PSGI-compliant server.  It
should be an L<AnyEvent>-compatible server, such as L<Twiggy>.

It requires various submodules.  It also requires L<AnyEvent>,
L<IO::AIO>, and L<Path::Tiny>.

=head1 AUTHOR

Wakaba <wakaba@suikawiki.org>.

=head1 HISTORY

This Git repository was located at <https://github.com/wakaba/furuike>
and then transferred to the manakai project on 30 March, 2022.

=head1 LICENSE

Copyright 2015-2022 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=cut