This example project will count words in a given text and plot a bar chart of the 10 most common words.
- Inspired by and derived from https://github.com/coderefinery/word-count which is distributed under Creative Commons Attribution license (CC-BY 4.0).
Create a virtual environement
python3 -m venv venv
and activate it. On Unix (MacOSX and Linux) you do
. \venv/bin/activate
and on Windows you do
.\venv\Scripts\activate
Next you can install the dependencies
python3 -m pip install -r requirements.txt
Install the conda environment using
conda env create -f environment.yml
and activate the environment
conda activate word-count
- Create an automated workflow to count all words in the data folder and save the results to a new directory called
results
using the script count.py. - Also, use the script plot.py to create a figure for each dataset and save it in a folder called
figures
Note: Make sure to use appropriate names for the results and figures.
Solution: https://github.com/finsberg/word-count/tree/exercise-1
Create a test that verifies the implementation of count_words
in count.py. Add the test in a new folder called tests
.
Run the test(s) with pytest
.
Tip 1:
You can add the code
directory to your python path in your test using the following snippet
from pathlib import Path
import sys
here = Path(__file__).parent
sys.path.append((here / ".." / "code").as_posix())
Tip 2: You can run pytest using
python3 -m pytest
Solution: https://github.com/finsberg/word-count/tree/exercise-2
Create a GitHub action to run the test every time you push to the repo.
- Create a folder called
.github/workflows
and add the filetests.yml
to it with the following content.
# Simple workflow for deploying static content to GitHub Pages
name: Run tests
on: [push]
jobs:
run:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: python3 -m pip install -r requirements.txt
- name: Run tests
run: python3 -m pytest tests
Solution: https://github.com/finsberg/word-count/tree/exercise-3
Create a GitHub workflow for running the full analysis and uploading the results and figures as artifact. Create a new file .github/workflows/reproduce_results.yml
with the following content
# Simple workflow for deploying static content to GitHub Pages
name: Run tests
on: [push]
jobs:
run:
runs-on: ubuntu-22.04
env:
# Directory that will be published on github pages
DATADIR: ./data
FIGDIR: ./artifacts/figures
RESULTDIR: ./artifacts/results
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: python3 -m pip install -r requirements.txt
- name: Run all experiments
run: python3 run_all_experiments.py
- name: Upload artifact
if: always()
uses: actions/upload-artifact@v3
with:
path: ./artifacts
if-no-files-found: error
Solution: https://github.com/finsberg/word-count/tree/exercise-4
Create a release of your repository. Use the tag v1.0
Create a Dockerfile
in the root of the repo that captures the environment. Try to build the docker image locally, e.g (from the root to the repo)
docker build -t word-count .
Try to run a container
docker run --rm -it word-count
And make sure all the code works inside the container.
Solution: https://github.com/finsberg/word-count/tree/exercise-6
Create a GitHub workflow for building and pushing a docker image to the registry associated with your repository. Create a new file .github/workflows/docker-image.yml
with the following content
name: Create and publish a Docker image
on:
push:
branches:
- "!*"
tags:
- "v*"
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
Note 1: This will only run whenever you create a new tag (i.e you create a Release on GitHub) Note 2: This will create two different images; one for Linux/AMD64 and one for Linux/ARM64. Linux/ARM64 is the type of image you will need if you are running Docker on Mac with Apple Silicon.
Create a new release of the code, and make sure the workflow runs and creates an image (also called package on GitHub) in your repository