This repository demonstrates how to build Blacklab corpus via Docker and Nginx.
flowchart LR
subgraph Internet
D[Client]
end
subgraph DOCKER [Docker]
D --> N{"Load Balancer <br/> (Nginx)"}
N <==> B["/corpus-frontend"]
N <==> C["/blacklab-server"]
C <==> I
B <==> C
subgraph Indexes
I[("<div style='padding: 0rem 0.5rem;'>Indexes <br/>(by Indexer)</div> ")]
end
end
Clone the repository and make sure you are in the project directory
git@github.com:PTT-Corpus/blacklab-demo.git && cd blacklab-demo
To index your data, you need to add your xml data to the folder /data
(in ./indexer/data
).
deployment\
|-- ...
indexer\
|-- formats\ # custom blacklab index format
|-- data\
| |-- dcard_mock_data.xml # dcard mock data
| |-- ptt_mock_data.xml # ptt mock data
|-- ...
server\
|-- ...
We assume here that you are familiar with the BlackLab indexing process; see indexing with BlackLab to learn more.
To build the server for the first time:
docker compose up
Your index should now be accessible at http://localhost/corpus-frontend/.
Once the server builds successfully, a folder blacklab-indexes
will be generated and used by the blacklab server (i.e. its corresponding Docker container).
Hereafter, if you want let the blacklab server add new indexes, you need to stop the blacklab server by:
docker compose down
Then add your new xml files to the folder ./indexer/data
, and run:
docker compose up
If you have any suggestion or question, please do not hesitate to email me at philcoke35@gmail.com