/moodle-local_cleanurls

Lets drag Moodle's url structure into this century...

Primary LanguagePHP

What this plugin does:

It makes url's clean! Some real example url's:

  • A course page:

/course/MATH101

  • A module page:

/course/MATH101/lesson/6-how-to-do-matrix-multiplication

  • A user profile:

/user/brendan

Why bother?

Good URL design is a hall mark of properly engineered internet systems. That said URL's have long been poorly implemented in many systems, moodle included, to the point where browsers are now hiding URL's because they are so ugly and opaque.

For the canonical guide to good URL's refer to Tim Berners-Lee timeless page:

http://www.w3.org/Provider/Style/URI

There are many benefits to end users, but admittedly some of these are fading:

  • Human readable URL's when shared or embedded in social media
  • Better context of a page, eg what course is this forum in
  • 'Hackable' urls, going 'up' and also guessing urls is easier

But despite the fading of importance of URL's to browsers and users there are still many reasons why clean urls are a good thing:

  • Much better filtering and reporting in log files and analytics software
  • More resilient links when migrating systems (eg backup and restore to a new moodle but mostly keep your urls the same)
  • Deterministic linking in from external pages, (eg deep link from a course catalog or staff directory into moodle)
  • Easier management of robots.txt

Design principals

Backward compatibility

URL's must always work, old and new. Old url's should be seamlessly upgrade to new url's where possible.

Some url rewrites such as those involving a course shortcode or username instead of id's, maybe be brittle if your site allows these things to change, so these are optional.

Speed is king

Speed is an integral part of the user experience. So we want to avoid things like 302 redirects, cache internally any expensive processing. If a url is never going to be seen by an end user, then avoid cleaning it.

Human readable

A typical moodle url looks like this:

/mod/forum/view.php?id=6

This is fairly opaque and tells us very little. We should add extra information into the url to make it readable, giving it context, whilst at the same time removing extaneous information such as the php extension. eg

/course/MATH101/lesson/6-how-to-do-matrix-multiplication

Note we have also added redundant heirarchical information, ie the course path components. This immediately gives context, but is also useful to non humans, such as for Google Analytics to create 'drill down' reports.

Automatic

Moodle already has rich meta data which we can leverage to produce clean url's. We don't want the site admins, let alone the teachers, to have to do anything extra. It should Just Work.

How it works

Rewrite outgoing links

This plugin adds a very small hack to the moodle_url->out() method which cleans the links that moodle renders onto a page. It applies a variety of safe tranformations, and if the more aggressive settings are on, it applies some much deeper tranformations by reaching into the moodle navigation heirarchy to add extra redundant path elements to the url. Unfortunately we can't often read this information until we are on the page that uses them, or a nearby page, so the first time we render that page we clean the url and cache it for next time.

Rewrite incoming links

Incoming links are diverted by an apache rewrite rule to router.php, which then uncleans the url and passes it back into moodle which doesn't know anything was different.

Base href

Not every moodle link uses moodle_url, and some may also use relative links. Because the clean url may be wildy different to the original, these legacy links will break. To fix this, we add a base href tag of the original url to any pages with a rewritten url. An example of these are the module index pages which use relative links to the discussions.

/mod/forum/index.php?id=4

Canonical link

If a robot like google is scraping your page, we don't want to split the pagerank between the old and clean url, and we want to ensure that google always sends people to the clean url. We acheive this by rendering a 'canonical' link in the HTML head. This is similar to a 302 redirect but just for robots, and doesn't incur a roundtrip penalty.

http://en.wikipedia.org/wiki/Canonical_link_element

This also now makes it much easier to manage parts of your site using robots.txt

history.replaceState

The are many ways a url gets shared, copy and paste, a 'share' widget etc. We want the url to be correct as soon as possible, so even if the link we clicked on was an normal moodle url, we replace this as soon as possible using html5 history.replaceState()

We also need to do this early, before things that use the url such as a Google Analytics tracking code. We want the url's to be nice in GA so we get clean 'drill down' report etc

The only down side to this approach is if you have outbound link tracking on the referring page.

https://developer.mozilla.org/en-US/docs/Web/Guide/API/DOM/Manipulating_the_browser_history#The_replaceState()_method

Installation

Step 1: Install the local module

eg using git submodule:

git submodule add git@github.com:brendanheywood/moodle-local_cleanurls.git local/cleanurls

OR you can download as a zip from github

https://github.com/brendanheywood/moodle-local_cleanurls/archive/master.zip

Extract this into /var/www/yourmoodle/local/cleanurls/

Then run the moodle upgrade as normal.

Step 2: Apply tiny patches to core (Moodle 2.9 and 3.0 only)

This plugin uses core api's which were only added 3.1 - two new hooks in:

  1. moodle_url in lib/weblib.php to intercept outgoing urls
  2. standard_head_html() in lib/outputrenderers.php to include head related fixes

You can apply it in one line for 2.9 and 3.0:

For Moodle 3.0:

git apply local/cleanurls/core30.patch

For Moodle 2.9:

git apply local/cleanurls/core29.patch

This patch it also available in github:

https://github.com/brendanheywood/moodle/tree/MDL-28030-cleanurls

It was built using git like this

git format-patch xyz123 --stdout > local/cleanurls/core.patch

Step 3: Add the webserver rewrite to the custom router

While Web Servers usually map a URL to a file, to have the semantic URLs from CleanURLs working in Moodle it is necessary to tweak that behaviour.

We need to:

  • Keep the old URLs working - if the URL maps to a file, use it.
  • Redirect the requests that would be a 404 to local/cleanurls/router.php.
  • Add a query parameter q with the original requested URL.
  • Append (keep) all previous parameters if existing.

If using apache, this is the suggested configuration:

# Replace the path below with your Moodle wwwroot.
<Directory /var/www/moodle>
   # Enable RewriteEngine
   RewriteEngine on
   # All relative URLs are based from root
   RewriteBase /
   # Do not change URLs that point to an existing file.
   RewriteCond %{REQUEST_FILENAME} !-f
   # Do not change URLs that point to an existing directory.
   RewriteCond %{REQUEST_FILENAME} !-d

   # Rewrite URLs matching ^(.*)$ as $1 - this means all URLs.
   # Rewrite it to the cleanurls router
   # Use ?q=$1 to forward the original URL as a query parameter
   # Use the flags:
   # - L (do not continue rewriting)
   # - B (encode back the parameters)
   # - QSA (append the original query string parameters)
   RewriteRule ^(.*)$ local/cleanurls/router.php?q=$1 [L,B,QSA]
</Directory>

If using nginx, all you need to do is add one more try_files entry pointing to the router, as follows:

location / {
    # For more details, see: http://nginx.org/en/docs/varindex.html
    try_files $uri $uri/ /local/cleanurls/router.php?q=$uri&$args;
}

Reminder: nginx addresses will have a '/' at the beginning of the URL, whereas Apache will not. This is addressed in the Clean URLs plugin by simply trimming the initial slashes.

Moodle in a subdirectory

Use the following examples to adjust your server if you Moodle resides in a subdirectory.

For example: http://moodle.com/sub1/sub2

Apache:

DocumentRoot /var/www/shared/

<Directory /var/www/shared>
    RewriteEngine on
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^sub1/sub2/(.*)$ sub1/sub2/local/cleanurls/router.php?q=$1 [L,B,QSA,END]
</Directory>

Apache .htaccess Apache (optional) - Ensure you have the AllowOverride all set for your Moodle subdirectory.

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ local/cleanurls/router.php?q=$1 [L,B,QSA]

nginx

# nginx
server {
  # ...

  location /sub1/sub2 {
    try_files $uri $uri/ @moodlerewrite;
  }

  location ~ ^/sub1/sub2/(.*\.php)(/|$) {
    fastcgi_split_path_info  ^(.+\.php)(/.+)$;
    fastcgi_index            index.php;
    fastcgi_pass             unix:/run/php/php7.0-fpm.sock;
    include                  fastcgi_params;
    fastcgi_param   PATH_INFO       $fastcgi_path_info;
    fastcgi_param   SCRIPT_FILENAME $document_root$fastcgi_script_name;
  }

  location @moodlerewrite {
    rewrite ^/sub1/sub2/(.+)$ /sub1/sub2/local/cleanurls/router.php?q=$1&$args last;
    fastcgi_pass             unix:/run/php/php7.0-fpm.sock;
    fastcgi_param   PATH_INFO       $fastcgi_path_info;
    fastcgi_param   SCRIPT_FILENAME $document_root$fastcgi_script_name;
  }
}

Step 4: Turn it on and configure

  1. Change the first couple lines of config.php to this:
<?php  // Moodle configuration file

global $CFG;
if (!isset($CFG)) {
    $CFG = new stdClass();
}

... normal config

This allows the router to bootstrap moodle, and then defer the real page.

  1. Add this to, or uncomment it in, your config.php file:

$CFG->urlrewriteclass = '\local_cleanurls\url_rewriter';

  1. Go to the /admin/settings.php?section=local_cleanurls settings page and it should show a green success message if it detects the router rewrite is in place and working. If not check step 3.

Now you can Tick the box turning on the rewrites and tune the other options If you have any issues then turn on the rewrite logging and tail your apache log for details.

Trouble shooting

  1. If the incoming url rewriting isn't working:
  • Have you restarted apache?
sudo service apache2 restart
  • Is the apache rewrite module enabled?
apache2ctl -M | grep rewrite

If not then enable it:

sudo a2enmod rewrite
sudo service apache2 restart
  • Is the apache rewrite rule actually working? Turn on on full apache rewrite debugging:
# This is for apache 2.4+
LogLevel debug rewrite:trace8

You should see items like this in your apache error logs for every page load, even ones which do not get rewritten. If you do not see this then the logging is not working. If you do see this then isolate a single page load, trace through the regex logic and see what rules are being matched and why they did or did not match and rewrite.

[rewrite:trace3] [pid 24024] mod_rewrite.c(476): [client 127.0.0.1:49658] 127.0.0.1 - - [moodle.local/sid#7f06654cacd8][rid#7f06653eb0a0/initial] [perdir /var/www/moodle.local/] strip per-dir prefix: /var/www/moodle.local/blah -> blah
[rewrite:trace3] [pid 24024] mod_rewrite.c(476): [client 127.0.0.1:49658] 127.0.0.1 - - [moodle.local/sid#7f06654cacd8][rid#7f06653eb0a0/initial] [perdir /var/www/moodle.local/] applying pattern '^(.*)$' to uri 'blah'
[rewrite:trace4] [pid 24024] mod_rewrite.c(476): [client 127.0.0.1:49658] 127.0.0.1 - - [moodle.local/sid#7f06654cacd8][rid#7f06653eb0a0/initial] [perdir /var/www/moodle.local/] RewriteCond: input='/var/www/moodle.local/blah' pattern='!-f' => matched
[rewrite:trace4] [pid 24024] mod_rewrite.c(476): [client 127.0.0.1:49658] 127.0.0.1 - - [moodle.local/sid#7f06654cacd8][rid#7f06653eb0a0/initial] [perdir /var/www/moodle.local/] RewriteCond: input='/var/www/moodle.local/blah' pattern='!-d' => matched

Do you get a themed Moodle error page? Or an 404 error page from apache?

Contributing

  • Pull requests welcome!
  • You may find that certain links in moodle core, or particular plugins don't use moodle_url - so best patch them and push back upstream

TODO global settings check in settings page