/static-doc-specificity

Conducts query-independent document specificity scoring.

Primary LanguageC++MIT LicenseMIT

License: MIT

Query-Independent Document Specificity Scoring

The project calculates a pointwise query-independent document specificity score for use with document ranking.

Where standard pointwise learning-to-rank methods calculate their scores based on a term that appears both in the query and document, the pointwise learning-to-rank methods used in this project are conducted for every term in the document.

Methods

This project provides the following models:

  • Normalized inverse document frequency based specificity score
  • Term entropy based specificity score

More details on the formulas used can be found in FORMULAS.md.

Getting Started

Adding to your project

The reccommended way to add this library to you project is by including the following to your CMakeLists.txt:

cmake_minimum_required(VERSION 3.13)
project(myProject)

include_directories("path/to/static-doc-specificity/include")
add_subdirectory("path/to/static-doc-specificity")

add_executable(myProject myProject_SOURCES)
# or `add_library(myProject myProject_SOURCES)`

target_link_libraries(myProject staticspecrank)

Usage

The library has can be included in your source files with the following:

#include <staticSpecRank/Term.h>
#include <staticSpecRank/calcSpecificityScore.h>

The score for a given document can be calculated by calling the specScore::calcSpecificityScore(scoreBase, numDocsInCorpus, docSize, docTermVector) where the scoreBase variable is either 0 for NIDF or 1 for term entropy.

The docTermVector must be of type std::vector<Term>. See the file include/staticSpecRank/Term.h for details on constructing the term vector.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgements

This project is based on the following paper:

  • Zheng L., Cox I.J. (2009) Re-ranking Documents Based on Query-Independent Document Specificity. In: Andreasen T., Yager R.R., Bulskov H., Christiansen H., Larsen H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science, vol 5822. Springer, Berlin, Heidelberg