A small project for IR
First, go into folder SEARCH-RANKER
:
cd search-ranker
Then use python
command to run:
python main.py
crawler.py
: scratch website of www.foxnews.com and get 200+ html files as a local dataset.iat_ws_python3.py
: API for transfer .mp3 file to the content of speech.src
: a folder containing the implement of the search-ranker algorithm.
- Inverted Index:
"word":{
"doc_id":[pos_list]
}
- Link Info:
"doc_id":{
"topic":"great-outdoors"
"headline":"Australian fisherman shows off giant rock lobster in TikTok video"
"datePublished":"2021-05-13T04:00:23-04:00"
"url":
}
-
Ranking algorithm:
- VSM
- wf-idf
- to be continue
-
Query:
- Single word
- Multiple words
- Phrase (unfinished)
- to be continue
-
Bonus feature: speech
$\rightarrow$ text for query searching