NTU CI6226 Information Retrieval Assignment
Cheng Hao, Guo Lanqing, Lan Tian, Li Ruibo, Yang Ze
Information Search
is a information retrieval system. Apply Django for web program, Bootstrap for the front end, and this system includes two types of corpus: 1) Our Novels dataset 2) HillaryEmails dataset; three different search methods: 1)
├── InforRetrieval
│ ├── __init__.py
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── manage.py --entrance
├── search_web --a django app
│ ├── Info_retrieval --search algorithms
│ │ ├── components.py
│ │ └── main.py
│ ├── spider --spider for novel website
│ │ ├── Conversion_encoding_to_utf_8.py
│ │ └── Renumber.py
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── migrations
│ │ └── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py --data require function
├── static --static resource
│ ├── css
│ ├── img
│ └── js
│ ├── bootstrap
│ ├── font-awesome
│ ├── jquery
│ └── simple-line-icons
└── templates --html
├── content.html
└── index.html
- python 3.7
- nltk
- tqdm
- django
Our Novels Dataset can be download here
- Clone this program to local path
python manage.py runserver
# run server in default port 8000- Access Link:http://127.0.0.1:8000/index
It will take about 40 mins to create the index for the Novels and HillaryEmails corpora. Considering such long time it takes, we have already deployed it on the server, feel free to get access via http://154.8.218.119:10101/index. Note that the physical address of the server is in China. We appreciate your kind patience to wait the connection. Thank you.