/postgresql-full-text-search-engine

This application demonstrates how to use PostgreSQL as a full-text search engine.

Primary LanguagePythonApache License 2.0Apache-2.0

PostgreSQL as a full-text search engine

Introduction

This application shows how to use PostgreSQL as a full-text search engine.

It demonstrates a minimal approach, with the understanding that many applications already use PostgreSQL as a primary data store. With the addition of an extra column, index, and a trigger to the existing database schema, you may be able to use PostgreSQL directly for full-text search and avoid the pain of maintaining a separate search engine such as Solr or Sphinx.

Overview

This example is divided into three separate scripts:

  1. initialize: creates the database schema and populates it with data

  2. restapp/restserv: starts a REST server to support REST-based queries of the database, based on the pattern http://hostname/search/<query>/<page>/<limit>. The service returns the results (if any) in JSON to the caller.

  3. webapp/webserv: starts a Web server with a minimal HTTP interface for human-entered queries. The queries are directed to the REST server, and the JSON results are rendered in a crude but easily extensible fashion.

Dependencies

  • A modern version of PostgreSQL; any currently supported version will suffice.

  • Python 3.3 or higher

  • Flask 0.10 or higher

  • py-postgresql

Setup

  1. Create a virtual environment using virtualenv, adjusting the location of your Python 3 executable as required:

    virtualenv --python=/usr/bin/python3 ~/postgres-fts
    # Set your environment to use this virtual Python installation
    . ~/postgres-fts/bin/activate
    # Install the dependencies
    pip install flask
    pip install py-postgresql
  2. Create the PostgreSQL database and a PostgreSQL user. Depending on your environment, something like the following should work:

    sudo su - postgres
    createdb pgfts
    # Note the following creates a PostgreSQL superuser, which is probably
    # much more permission that you want in production; see the PostgreSQL
    # documentation on granting privileges for real.
    createuser -s -P fts_user
    # Stop being the postgres user
    exit
  3. Create a ~/.pgpass file that contains the credentials required to access the new database you just created, per http://www.postgresql.org/docs/devel/static/libpq-pgpass.html. This may look something like:

    #hostname:port:database:username:password
    localhost:*:pgfts:fts_user:mysecretssss
  4. Test your connection to the database using the psql command. If the connection fails, you may need to modify pg_hba.conf to allow TCP/IP connections to your database with md5 authentication.

    psql -h localhost -p 5432 -U fts_user -d pgfts
  5. Modify the constants in initialize, restapp/restserv, and webapp/webserv, if necessary, to reflect the required PostgreSQL connection information.

  6. Create and populate the database schema:

    . ~/postgres-fts/bin/activate
    ./initialize
  7. Start up the REST server. The REST server runs on localhost port 8001 by default; you can change the PORT constant to tell it to run elsewhere:

    . ~/postgres-fts/bin/activate
    ./restapp/restserv
  8. The Web server runs on localhost port 5000 by default, and expects to find the REST server running on localhost port 8001. Change PORT and JSON_HOST respectively if necessary, then start the Web server:

    . ~/postgres-fts/bin/activate
    ./webapp/webserv
  9. Open http://localhost:5000 in your Web browser (assuming you are using the defaults) and test it out. Searching for "trees", "sketching", or "schema.org" should return results!

Further reading

PostgreSQL is capable of much, much more, including language-specific support, custom dictionaries, boolean operators, wildcard searching, and relevancy ranking configuration.

Flask is a powerful Web framework that includes simple mapping of routes to functions and the jinja2 template system.

TODO

  • Demonstrate the use of a parser to support more complex queries (including booleans and wildcard support)

License

Copyright 2013 Dan Scott, dscott@laurentian.ca
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.