kensanata/oddmuse

Customizable namespace name pattern

bandali0 opened this issue · 16 comments

Background: Currently, the Namespaces Extension requires that namespace names match $InterSitePattern, forcing them to start with an uppercase letter and only consist of alphabet letters.

Request: It would be nice to allow customization of namespace url pattern by introducing a $NamespacePattern.

For my use-case, I would like to allow namespace names starting with and (optionally) consisting of only digits. I asked on #oddmuse and @AlexDaniel suggested the following pattern:

$NamespacePattern = '[A-Z0-9\x{0080}-\x{fffd}]+[A-Za-z0-9\x{0080}-\x{fffd}]+'

Note: a-z is omitted to avoid matching the keep, modules, page, and temp directories.

The following is a git-diff of the changes I made to namespaces.pl to allow for numeric namespace names, basically replacing every ocurrence of InterSitePattern with the new NamespacePattern:

diff --git a/modules/namespaces.pl b/modules/namespaces.pl
index 7ade4caa..3a431ed2 100644
--- a/modules/namespaces.pl
+++ b/modules/namespaces.pl
@@ -42,7 +42,7 @@ AddModuleDescription('namespaces.pl', 'Namespaces Extension');
 
 use File::Glob ':glob';
 
-our ($q, %Action, %Page, @IndexList, $Now, %InterSite, $SiteName, $ScriptName, $UsePathInfo, $DataDir, $HomePage, @MyInitVariables, @MyAdminCode, $FullUrl, $LinkPattern, $InterSitePattern, $FreeLinks, $FreeLinkPattern, $InterLinkPattern, $FreeInterLinkPattern, $UrlProtocols, $WikiLinks, $FS, $RcFile, $RcOldFile, $RcDefault, $PageDir, $KeepDir, $LockDir, $TempDir, $IndexFile, $VisitorFile, $NoEditFile, $WikiDescription, $LastUpdate, $StaticDir, $StaticUrl, $InterWikiMoniker, $RefererDir, $PermanentAnchorsFile);
+our ($q, %Action, %Page, @IndexList, $Now, %InterSite, $SiteName, $ScriptName, $UsePathInfo, $DataDir, $HomePage, @MyInitVariables, @MyAdminCode, $FullUrl, $LinkPattern, $NamespacePattern, $FreeLinks, $FreeLinkPattern, $InterLinkPattern, $FreeInterLinkPattern, $UrlProtocols, $WikiLinks, $FS, $RcFile, $RcOldFile, $RcDefault, $PageDir, $KeepDir, $LockDir, $TempDir, $IndexFile, $VisitorFile, $NoEditFile, $WikiDescription, $LastUpdate, $StaticDir, $StaticUrl, $InterWikiMoniker, $RefererDir, $PermanentAnchorsFile);
 our ($NamespacesMain, $NamespacesSelf, $NamespaceCurrent,
            $NamespaceRoot, $NamespaceSlashing, @NamespaceParameters,
            %Namespaces);
@@ -80,18 +80,19 @@ $NamespaceSlashing = 0;   # affects : decoding NamespaceRcLines
 # variables (eg. localnames.pl)
 unshift(@MyInitVariables, \&NamespacesInitVariables);
 
+my $NamespacePattern = '[A-Z0-9\x{0080}-\x{fffd}]+[A-Za-z0-9\x{0080}-\x{fffd}]+';
 sub GetNamespace {
   my $ns = GetParam('ns', '');
   if (not $ns and $UsePathInfo) {
     my $path_info = decode_utf8($q->path_info());
     # make sure ordinary page names are not matched!
-    if ($path_info =~ m|^/($InterSitePattern)(/.*)?|
+    if ($path_info =~ m|^/($NamespacePattern)(/.*)?|
        and ($2 or $q->keywords or NamespaceRequiredByParameter())) {
       $ns = $1;
     }
   }
   ReportError(Ts('%s is not a legal name for a namespace', $ns))
-    if $ns and $ns !~ m/^($InterSitePattern)$/;
+    if $ns and $ns !~ m/^($NamespacePattern)$/;
   return $ns;
 }
 
@@ -102,7 +103,7 @@ sub NamespacesInitVariables {
     $Namespaces{$NamespacesMain} = $ScriptName . '/';
     foreach my $name (Glob("$DataDir/*")) {
       if (IsDir($name)
-         and $name =~ m|/($InterSitePattern)$|
+         and $name =~ m|/($NamespacePattern)$|
          and $name ne $NamespacesMain
          and $name ne $NamespacesSelf) {
        $Namespaces{$1} = $ScriptName . '/' . $1 . '/';
@@ -311,13 +312,13 @@ sub NewNamespaceScriptUrl {
   if ($action =~ /^($UrlProtocols)\%3a/) { # URL-encoded URL
     # do nothing (why do we need this?)
   } elsif ($action =~ m!(.*?)([^/?&;=]+)%3a(.*)!) {
-    # $2 is supposed to match the $InterSitePattern, but it might be
+    # $2 is supposed to match the $NamespacePattern, but it might be
     # UrlEncoded in Main:RecentChanges. If $2 contains Umlauts, for
-    # example, the encoded $2 will no longer match $InterSitePattern.
+    # example, the encoded $2 will no longer match $NamespacePattern.
     # We have a likely candidate -- now perform an additional test.
     my ($s1, $s2, $s3) = ($1, $2, $3);
     my $s = UrlDecode($s2);
-    if ($s =~ /^$InterSitePattern$/) {
+    if ($s =~ /^$NamespacePattern$/) {
       if ("$s2:$s3" eq GetParam('oldid', '')) {
        if ($s2 eq $NamespacesMain) {
          $ScriptName = $NamespaceRoot;
@@ -353,8 +354,8 @@ sub NewNamespaceGetAuthorLink {
 }
 
 sub NewNamespaceValidId {
-  local $FreeLinkPattern = "($InterSitePattern:)?$FreeLinkPattern";
-  local $LinkPattern = "($InterSitePattern:)?$LinkPattern";
+  local $FreeLinkPattern = "($NamespacePattern:)?$FreeLinkPattern";
+  local $LinkPattern = "($NamespacePattern:)?$LinkPattern";
   return OldNamespaceValidId(@_);
 }
 
@@ -383,8 +384,8 @@ sub NewNamespaceBrowsePage {
   my $text = $revisionPage->{text};
   my $oldId = GetParam('oldid', '');
   if (not $oldId and not $revision and (substr($text, 0, 10) eq '#REDIRECT ')
-      and (($WikiLinks and $text =~ /^\#REDIRECT\s+(($InterSitePattern:)?$InterLinkPattern)/)
-          or ($FreeLinks and $text =~ /^\#REDIRECT\s+\[\[(($InterSitePattern:)?$FreeInterLinkPattern)\]\]/))) {
+      and (($WikiLinks and $text =~ /^\#REDIRECT\s+(($NamespacePattern:)?$InterLinkPattern)/)
+          or ($FreeLinks and $text =~ /^\#REDIRECT\s+\[\[(($NamespacePattern:)?$FreeInterLinkPattern)\]\]/))) {
     my ($ns, $page) = map { UrlEncode($_) } split(/:/, FreeToNormal($1));
     $oldId = ($NamespaceCurrent || $NamespacesMain) . ':' . $id;
     local $ScriptName = $NamespaceRoot || $ScriptName;

It may need more work to sand down any rough edges, but it's been working pretty great for me so far.

Thanks! As for moving namespaces to a subdir, good idea, I'm all for it!

Looking at this again, I'm a bit torn: if you can change the namespace pattern independently of the inter site pattern, then you loose the ability to link to other namespaces using a simple prefix. How do you handle that? You just never use it?

I hadn't thought of that :/ I guess like you said I'm just not using it as of now?

I'm all for a better way to do this if there is one 🙂

Well, for campaignwiki.org I've used the following:

# Allow namespaces with spaces: Require upper case first letter and no slashes,
# then word characters or underlines. Spaces will get turned into underlines.
# And because of historic reasons, we also support NO-BREAK SPACE and SPACE.

# $InterSitePattern = '\p{Uppercase}\p{Word}*';
# $InterSitePattern = '\p{Uppercase}[[:alpha:]_  ]*';
$InterSitePattern = '[\p{Uppercase}\d][\w_  ]*';

# Redefine these as well if you change $InterSitePattern since InitLinkPatterns is called before InitConfig!
$InterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,]*[-a-zA-Z0-9\x{0080}-\x{fffd}_=#\$\@~`\%&*+\\/])$QDelim";
$FreeInterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,()' ]+)";

It should work for you as well, as I'm using [\p{Uppercase}\d] for the first character.

Having to redefine the two interlink patterns is ugly, of course, so perhaps I should move those two definitions into a separate function which we could then call after setting the intersite pattern.

Thanks, I switched over to your settings, and seems to be working.

I'm facing an issue, however: whether with my previous config or your current one, my changes to namespaces with numerical names don't get saved :( any thoughts as to why?

It seem to work for me. Let me do some more testing.

I use the following config file:

$AdminPass = 'foo';
$ScriptName = 'http://localhost:8080';
$SurgeProtection = 0;
$WikiLinks = 1;

$InterSitePattern = '[\p{Uppercase}\d][\w_  ]*';
$InterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,]*[-a-zA-Z0-9\x{0080}-\x{fffd}_=#\$\@~`\%&*+\\/])$QDelim";
$FreeInterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,()' ]+)";

My modules directory contains nothing by a symbolic link to namespaces.pl.

My data dir currently looks as follows:

  /home/alex/src/oddmuse/test-data:
  total used in directory 72 available 308517392
  drwxr-xr-x 12 alex alex  4096 Jul 16 22:49 .
  drwxr-xr-x 80 alex alex 12288 Jul 16 22:49 ..
  drwxr-xr-x  4 alex alex  4096 Jul 16 22:49 007
  drwxr-xr-x  4 alex alex  4096 Jul 16 22:45 Alex
  drwxr-xr-x  3 alex alex  4096 Jun 10 13:45 Ford
  drwxr-xr-x  6 alex alex  4096 Jun 10 13:54 Muu
  drwxr-xr-x  4 alex alex  4096 Jun 10 13:45 Zürich
  drwxr-xr-x  4 alex alex  4096 Jun 10 13:45 Zürich♥
  -rw-r--r--  1 alex alex   613 Jul 16 22:44 config~
  -rw-r--r--  1 alex alex   402 Jul 16 22:48 config
  drwxr-xr-x  3 alex alex  4096 Jun 10 13:45 keep
  drwxr-xr-x  2 alex alex  4096 Jun 10 13:45 modules
  drwxr-xr-x  2 alex alex  4096 Jun 10 13:45 page
  -rw-r--r--  1 alex alex     7 Jun 10 13:45 pageidx
  -rw-r--r--  1 alex alex   131 Jun 10 13:45 rc.log
  drwxr-xr-x  2 alex alex  4096 Jun 10 13:45 temp

Result looks OK:

image

I added a test and it still seems to work: fded175

Can you try and reproduce the fact that it works using my minimal example? If that doesn't work, perhaps try the latest wiki.pl and namespaces.pl from the repo and we can then use git bisect to find the commit that prevents it from working. We should at least document it...

Thanks for taking a closer look.

I pinpointed the source of my issue, but not "why" it happens.

The only meaningful differences between my server.pl and the one checked into the repo is that I'd replaced route => '/wiki' with route => '/', and had commented out the get '/' => sub { ... subroutine, so that I could run the wiki and its pages at the root level, without the /wiki prefix. Accordingly, I'd set $FullUrl = 'https://emacsconf.org' and $ScriptName = "$FullUrl" in the config file.

When I added /wiki to the end of $FullUrl and restored route back to /wiki, the edit I made to the numerical namespace's main page was saved again.

Would it be possible to modify namespaces.pl to work with this sort of configuration (without /wiki)? Editing normal pages seems to have worked fine so far, and oddly enough editing namespace pages used to work fine too when I had last tried a month or two ago, but has since stopped working sometime between then and yesterday when I tried again.

P.S. the config file is available at https://emacsconf.org/config.

I've used the same setup as before, but the following server.pl script, which is the same as stuff/mojolicious-app.pl:

use Mojolicious::Lite;

plugin CGI => {
  support_semicolon_in_query_string => 1,
};

plugin CGI => {
  route => '/',
  script => 'wiki.pl',
};

app->start;

This works for me.

Config:

$AdminPass = 'foo';
$ScriptName = 'http://localhost:8080';
$SurgeProtection = 0;
$WikiLinks = 1;

$InterSitePattern = '[\p{Uppercase}\d][\w_  ]*';
$InterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,]*[-a-zA-Z0-9\x{0080}-\x{fffd}_=#\$\@~`\%&*+\\/])$QDelim";
$FreeInterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x{0080}-\x{fffd}_=!?#\$\@~`\%&*+\\/:;.,()' ]+)";

Started using:

morbo --listen http://*:8080 \
--watch wiki.pl --watch test-data/config --watch test-data/modules/ \
server.pl

Then I changed it to the following:

use Mojolicious::Lite;

# This needs to be in a different section, sometimes?
plugin CGI => {
  support_semicolon_in_query_string => 1,
};

plugin CGI => {
  route => '/',
  # We need this for older versions of Mojolicious::Plugin::CGI
  script => 'wiki.pl',
  run => \&OddMuse::DoWikiRequest,
  before => sub {
    no warnings;
    $OddMuse::RunCGI = 0;
    # The default data directory is determined by the environment variable
    # WikiDataDir and falls back to the following
    # $OddMuse::DataDir = '/tmp/oddmuse';
    use warnings;
    require './wiki.pl' unless defined &OddMuse::DoWikiRequest;
  },
  env => {},
  # path to where STDERR from cgi script goes
  errlog => ($ENV{WikiDataDir} || '/tmp/oddmuse')
      . "/wiki.log",
};

# get '/' => sub {
#   my $self = shift;
#   $self->redirect_to('/wiki');
# };

app->start;

And it still works. By "work" I mean I can visit the existing page 007/Bond and edit it and the new content is shown (based on the problem description "changes to namespaces with numerical names don't get saved"). I also visited 8/Trump and edited it and it got saved as a new page.

I'm still unable to reproduce the problem, apparently.

Are you accessing the website directly or via a webserver that acts as a proxy?

Thanks. I'm using it behind nginx like so:

upstream emacsconf {
	server 127.0.0.1:11199;
}
server {
...
	location / {
		proxy_pass http://emacsconf;
		proxy_http_version 1.1;
		proxy_set_header Upgrade $http_upgrade;
		proxy_set_header Connection "upgrade";
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto $scheme;

		if ($request_uri ~ "^/([\d]{4})$") {
        		return 302 /$1/;
		}
	}
}

I found the source of my problems: that if block I use to redirect a page like /2015 to /2015/. It just occurred to me to try and comment it out and see what happens. If I comment that out, my changes get saved again. I must have added that rule some time since my last edit a few months back, but it never dawned on me that it could be problematic. facepalms

Knowing that, I wonder if it'd be possible to redirect those numeric pages /2015 to their slashed version /2015/ (the namespace), without breaking saving?

I use something like this on Campaign Wiki.

# Fix visiting Main:X if the page doesn't exist but namespace X does.
# Redirect!
push(@MyInitVariables, \&MyNamespacesFix);
sub MyNamespacesFix {
  if (not GetParam('title', '')
      and GetParam('action', 'browse') eq 'browse') {
    my $id = FreeToNormal(GetId());
    if (not $NamespaceCurrent
	and (not $IndexHash{$id}
	     or OpenPage($id) and PageMarkedForDeletion())
	and $Namespaces{$id}) {
      print GetRedirectPage("$id/", NormalToFree($id));
      exit;
    }
  }
};