Wiki-RAG (Mediawiki Retrieval-Augmented Generation) is a project that leverages Mediawiki as a source for augmenting text generation tasks, enhancing the quality and relevance of generated content.
Wiki-RAG integrates Mediawiki's structured knowledge (ingested via API) with language models to improve text generation. By using retrieval-augmented generation and some interesting techniques, it provides more contextually relevant and accurate responses using any Mediawiki site as KB.
Labeled as an experimental project, Wiki-RAG is part of the Moodle Research initiative, aiming to explore and develop innovative solutions for the educational technology sector.
To get started with Wiki-RAG, ensure you have the following:
- Git
- Python 3.12 or later with pip (Python package installer)
- Docker (if you intend to run the project using Docker)
- Milvus 2.5.5 or later (for vector similarity search). Standalone or Distributed deployments are supported. Lite deployments are not supported. It's highly recommended to use the Docker Compose deployment specially for testing and development purposes.
- Set Environment Variables (to be replaced soon by
config.ymlfile)- Using the env_template file as source, create a new
.envfile:cp env_template .env
- Edit it to suit your needs (mediawiki site and namespaces or exclusions, models to use for both embeddings and generation, etc.)
- Using the env_template file as source, create a new
-
Clone the Repository:
git clone https://github.com/moodlehq/wiki-rag.git cd wiki-rag -
Set Up Virtual Environment (optional but recommended):
python -m venv .venv source .venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -e .If interested into contributing to the project, you can install the development dependencies and enable all the checks by running:
pip install -e .[dev] # To install all the development dependencies. pre-commit install # To enable all the check (style, lint, commits, etc.)
-
Run the Application: The application comes with four different executables:
wr-load: Will parse all the configured pages in the source Mediawiki site, extracting contents and other important metadata. All the generated information will be stored into a JSON file in thedatadirectory.wr-index: In charge of creating the collection in the vector index (Milvus) with all the information extracted in the previous step.wr-search: A tiny CLI utility to perform searches against the RAG system from the command line.wr-server: A comprehensive and secure web service (documented with OpenAPI) that allows users to interact with the RAG system using the OpenAI API (v1/modelsandv1/chat/completionsendpoints) as if it were a large language model (LLM).
-
Pull the image from GitHub Container Registry:
docker pull ghcr.io/moodlehq/wiki-rag:latest # or specify a tag -
Run the container:
- Note 1: Don't forget to have the
.envfile in the same directory as the command below. - Note 2: The
datadirectory will be created in the current directory, and it will store all the data generated by thewr-loadcommand. If theLOADER_DUMP_PATHis set, you will have to change the volume mapping accordingly.
docker run --rm --detach \ --volume $(pwd)/data:/app/data \ --volume $(pwd)/.env:/app/.env \ --env MILVUS_URL=http://milvus-standalone:19530 \ --network milvus \ --publish 8080:8080 \ --env OPENAI_API_KEY=YOUR_OPENAI_API_KEY \ --env LOG_LEVEL=info \ --name wiki-rag \ wiki-rag:latest- Note 3: The command above will start the
wr-serverautomatically, listening on the configured port (8080) and theOPENAI_API_KEYis required to interact with the embedding and LLM models. If, instead, you want to execute any of the other commands (wr-load,wr-index,wr-search), you can specify it as the last argument. - Note 4: The 2 lines related to Milvus are required to connect to the Milvus server if also running in Docker. If it's running elsewhere, you can replace the
MILVUS_URLwith the appropriate URL or configure it in the.envfile instead - Note 5: You can use
docker logs wiki-ragto check the logs of the running container (wr-serverlogs). - Note 6: When running the
wr-server, you still can execute any of the commands (wr-load,wr-index,wr-search) usingdocker exec -it wiki-rag <command>. - Note 7: To stop and remove the container, you can use
docker stop wiki-rag.
- Note 1: Don't forget to have the
-
Download the Milvus Docker Compose file:
wget https://github.com/milvus-io/milvus/releases/download/v2.5.5/milvus-standalone-docker-compose.yml -O milvus-standalone.yml
OR
curl https://github.com/milvus-io/milvus/releases/download/v2.5.5/milvus-standalone-docker-compose.yml -o milvus-standalone.yml
-
Run Wiki-RAG own Docker Compose file:
- Note 1: Don't forget to have the
.envfile in the same directory as the command below. - Note 2: The
datadirectory will be created in the current directory, and it will store all the data generated by thewr-loadcommand. If theLOADER_DUMP_PATHis set, you will have to change the volume mapping accordingly. - Note 3: The
volumesdirectory will be created in the current directory, ant it will store all the data required by the Milvus containers.
docker compose up -d
- Note 4: Some useful commands include:
- To stop all the containers, you can use
docker compose stop. - To start again all the containers, you can use
docker compose start. - To stop and remove all the containers, you can use
docker compose down.
- To stop all the containers, you can use
- Note 5: You can use
docker logs wiki-ragto check the logs of the running container (wr-serverlogs). - Note 6: When running, you still can execute any of the commands (
wr-load,wr-index,wr-search) usingdocker exec -it wiki-rag <command>.
- Note 1: Don't forget to have the
Coming soon...
Coming soon...
For more detailed information, please refer to the following files:
This project is licensed under the BSD 3-Clause License. See the LICENSE file for more information.
We welcome contributions! Please see our Contributing Guidelines for more details.
Please note that this project adheres to a Code of Conduct. By participating, you agree to uphold this code.
© 2025 Moodle Research Team