A job scrapper with a management interface built in Django, and containerized with Docker for easy deployment.
Setup was tested using Docker Desktop for Mac v2.3.0.3 which comes bundled with Engine v19.03.8 and Compose v1.25.5
git clone https://github.com/hazeltek/django-pgpostgis.git
cd django-pgpostgis
The .env.template
file lists all variables required for a proper setup. These settings are used
to automatically set up a new database when starting up the database docker container.
Most of the variables are set with sample values within angle brackets. Create a copy of the template
file as shown below and update all sample values in the new .env
file. The new values should not
have the angle brackets.
# create copy of .env.template named .env
# NOTE: .env files should never be committed to a repo
cp .env.template .env
Configure the PostgreSQL superuser login credentials under the PG SUPERUSER section, and the name
of the new application database and an associated user login credentials under the APP DATABASE
section within the .env
file. It is considered best practice to have a separate database user with
non-admin privileges for interacting with the application database.
More variables have been added to the APP DATABASE section, these variables are used by the Django application to build a connection string used to connect to the application database.
Configure Django specific settings that control aspects of the application as explained in the official
Django documentation found here. Provided below
is a brief description of core settings configurable using the .env
file:
ALLOWED_HOSTS
List of host/domain names (or IP addresses) that the application should handle HTTP requests for.
When DEBUG is set to True its effective value is ['localhost', '127.0.0.1'], meaning only requests
from the host system will be processed. Ideal for local development.
DEBUG
Turns on/off debug mode. When set to True detailed error traceback is displayed when an exception
occurs. This is ideal for local development. THIS SHOULD BE TURNED OFF WHEN RUN IN PRODUCTION.
SECRET_KEY
A key used for cryptograhic signing of cookies and other Django resources. This should be set to
a unique, unpredictable valule. DJANGO WILL REFUSE TO START IF NOT SET.
docker-compose up
This will:
-
build an image for the webapp docker service named
djpgp-webapp
based on the configurations and commands provided in theDockerfile
found in the project root directory. -
start a docker container for the database and webapp docker services
-
bind the host port
9876
to the postgres port5432
so that an application like pgAdmin installed on the host can be used to view and interact with the database inside the docker container. -
bind the host port
8888
to the Django development server port8000
so that the running Django application can be accessed from outside the webapp docker container. -
create and associate a volume named
djpgp-database_data
with the database container if one doesn't already exist for the storage of database data files. -
create default postgres databases, configure the superuser, and create the application database and application user using configured settings in the
.env
file inside the database docker container.NOTE: these settings only take effect when the container has no associated volume or when the volume doesn't already have database data files. If a volume with database data files already exist, these are used on all subsequent starting of the database container unless the volume is deleted. See the docker notes for postgres, under the Initialization scripts section for more details about this.
-
creates a mapped volume for the webapp container; the
webapp
folder on the host system within the project root directory is mapped (linked) to/app/webapp
folder within the container. This allows local changes made to files within thewebapp
folder on the host to reflect automatically inside the container. This is ideal for local development and eliminates the need to constantly rebuild the webapp image in order for changed files to be included in the image and available within containers created from the image on subsequent runs ofdocker-compose up
.
Access the Django application running within the webapp
container at http://localhost:8888/ from
your browser.
How-to instructions can be found within the docs/how-to
folder.
django-pgpostgis/ : project root directory
├── docs/ : contains .md files with how-to instructions
├── scripts/ : contains bash scripts
| ├── postgres/
| | └── create_db.sh : bash script mounted to database service image and run during container startup
| └── start.sh : bash script added to webapp service iamge and run during container startup
├── webapp/ : django application created via `django-admin startproject webapp`
| ├── jobs/ : django app added via `django-admin startapp jobs`
| ├── webapp/ : contains python modules for django core settings and others for url, wsgi etc
| └── manage.py : django management script
├── .editorconfig : contains coding style settings that can be shared across Editors and IDEs
├── .env.template : template file for .env file defining all env vars to run setup successfully
├── .gitignore
├── docker-compose.yml
├── Dockerfile : version of Dockerfile for building webapp image which uses pip
├── Pipfile : list of library dependencies for the django application maintained by pipenv
├── Pipfile.lock : lock file generated by pipenv to accompany Pipfile
└── README.md
From within the project root directory execute the set of commands provided below to fetch, build and run the application setup using the latest changes with the code repository within GitHub.
Remember to update the .env
file with any newly added environment variables. Kindly refer to the latest section
within the Changelog file to determine what the new variables are. Add the new variables and set
desired values.
# pull latest changes
git pull
# stop running containers
docker-compose down
# build new image with latest changes
docker-compose build --no-cache
# run application
docker-compose up
# if migration fails during container startup, drop the database volume with
# then run the application again via `docker-compose up`
docker volume rm djpgp-database_data
With the application running visit:
- The home page at: http://localhost:8888/
- The admin section at http://localhost:8888/admin
Load sample companies (2) and openings (3) data into the database with:
docker-compose run --rm webapp loadsample
# run all unit tests with (alias for `manage test --keepdb ./webapp/jobs/tests/`)
docker-compose run --rm webapp manage tests
# see available django management commands with
docker-compose run --rm webapp manage help
# run the scrapejobs command manually with
docker-compose run --rm webapp manage scrapejobs
# run other django management commands using
docker-compose run --rm webapp manage [command]