A simple Question Answering system built on a corpus of documents of different formats using Haystack and Streamlit
- Simply run the command pip install -r requirements.txt to install the dependencies.
If you run into any issues with installation of haystack, please refer this.
- Clone this repository and install the dependencies as mentioned above.
- Place your documents in the docs folder and simply run the following command in order to do bulk conversion of the documents to plain text:
python data.py
- We will convert the text documents into the haystack supported format and apply Preprocessor to clean and split the document into sensible units. We will store these preprocessed texts in a SQL document store. Run the following command to perform the indexing:
python index_pipeline.py
- We will download our reader (a pre-trained transformer model on QA task) and also initialize our retriever to search top k relevant documents in document store.For a given question, the retriever will search for the top ‘k’ documents relevant to the question and reader will predict answers using those ‘k’ documents instead of searching the whole document store. In order to test this, you can run the following script:
python search_pipeline.py
Here is our web app built using streamlit which is compatible with haystack and also it is easy to use. You can run the app by:
streamlit run app.py
- Ensure you have Docker Installed and Setup in your OS (Windows/Mac/Linux). For detailed Instructions, please refer this.
- Navigate to the folder where you have cloned this repository ( where the Dockerfile is present ).
- Build the Docker Image (don't forget the dot!! 😄 ):
docker build -f Dockerfile -t app:latest .
- Run the docker:
docker run -p 8501:8501 app:latest
This will launch the dockerized app. Navigate to http://localhost:8501/ in your browser to have a look at your application. You can check the status of your all available running dockers by:
docker ps