Insert a new Policy Fifles in MS Word format, the system then split the policy into smaller passages and create new BM25 index
- Document Processing
- Remove Tables in MS Word file
- Convert MS Word file to text file
- Create Header for Passages
- Crete Passages
- Export Passages
- Index Creation
- Read Passages
- Create new index
- Add new file into data/policy/word folder
- add policy Name and Role into role.json
- run document_processing.py
- run create_index.py
Beside, there is a "test.ipynb" file to test: - Passage length - Retrieval Function