Deep Neural Network for Learning to Rank Query-Text Pairs

Question

Deep Neural Network for Learning to Rank Query-Text Pairs

chullhwan-song opened this issue 5 years ago · 1 comments

chullhwan-song commented 5 years ago

https://arxiv.org/pdf/1802.08988.pdf

Answer 1 · 2019-12-02T02:53:18.000Z

Abstract

Learning to Rank에 의한 정보검색에서의 document ranking 문제를 다룸.
이를 위해, ConvRankNet 제한하는데, 이는 다음과같은 2개의 알고리즘이 결합된 형태
- Siamese Convolutional Neural Network encoder
- RankNet ranking
OHSUMED dataset 이용

Introduction

"Traditionally, people used to hand-tune ranking models such as TF-IDF or Okapi BM25 [Manning et al., 2008] which is not only time-inefficient but also tedious. " ㅎ 이 문장은 너무 공격적인데..~
learning to rank의 주 목적은 자동으로 랭킹모델을 fit하는데 있다.
이들은 보통은 "feature based" 기반이다.
- input(=query-document pair (q, d))은 ranking model에 의해 vector(v)로써 표현되어진다.
- 이러한 기반은 사실 실용적이지 못하다.
  - 예를들어, ES도 IDF를 실시간으로 뽑지 못하는데, feature 기반은 더더욱~
그래서 이 연구는 Real-time에 적용가능하도록 하는 연구인듯하다.
- ConvRankNet
  - 2개의 ConvNet encoder 로 구성 (Siamese Convolutional Neural Network (CNN) encoder)
    - q vs d_i & q vs d_j
  - RankNet
    - 3개의 layer feature(q vs d_i vs d_j) > pairwise ranking loss를 이용
      - three-layer neural network-based pairwise ranking model

ConvRankNet

먼저, 는 query set, 는 document set
여기서 중요한것은, : document중 쿼리와 적어도 한개의 word가 매칭되는 document(= could be the set of documents sharing at least one token with q.)
그래서,
: d_i는 d_j보다 query q에 대해 더 relevance하다는 의미.
query - doc pair를
구조

Siamese CNN Encoder

feature vectors 추출하는 역할
Fig.1의 CNN encoder는 weight를 공유
크게, sentence matrix, convolution feature maps, activation units, pooling layer, similarity measure 으로 구성

Sentence Matrix

한문장은 여러개의 word로 구성 :
각 word의 embedding vector(예- w2v)의 dimension(d)을 가진다.
sentence matrix S = nxd metrics

Convolution Feature Maps, Activation and Pooling

2d-filter of size m × d > sentence matrix S에 대해 sliding window를 진행
- conv 2d filter 크기가 mxd라는 의미인가??
conv
- 빨간색 부분 conv 연산(wx) 이고 b를 더해주는 형태
- 단지 단어 하나의 단위로 보임 :
  - i~n 까지
- 주위 context단위(window단위)로..계산 > - i~( i+sliding window of m)
non-linear activation unit 적용 = ReLU
- 위의 conv (v_i...v_n) 각각 적용
각 f_i..f_n의 filter에 의해(conv)생성된 feature (vi..vn) 에 대해 max pooling layer 적용

conv = tf.nn.conv2d( self.embedded_chars_left, W, strides=[1, 1, 1, 1], padding="VALID",name="conv")
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
pooled = tf.nn.max_pool( h,  ksize=[1, max_len_left - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool")

Similarity Mesure

다음과 같이 주어졌을때, > 3개의 vector
"Learning to rank short text pairs with convolutional deep neural networks." 에서 소개되었던 similarity matrix M 를 도입,
이를 도입하여 나타내면,
- M을 어떻게 구해내는지..??(저논문을 봐야할듯..)
이는 DL에서 fit하기가 어렵기 때문에 다음으로 대체~
pairwise ranking model's input
- 두개 network의 feature와 similarity를 feature로 concat하는 형태인데...similarity가 함께 들어가는 case는 좀 신기하다.
  - RankNet에 입장에서는 좋은것인가??

RankNet

2005 : " Learning to rank with nonsmooth cost functions." 연구에서 제시.
- 실제 본 논문은 "Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks"의 연구에서 이 부분(RankNet)을 insert한 케이스이다.
  - 참고로 이논문은 softmax
loss function
- target probability :
  - 라벨링 정보 > 정답 랭킹 정보라고 생각하면 될듯~
  - 이는 수식(1)을 참조(동일)
more detailed
- Pairwise loss
- 다양한 neural ranking에서의 loss 접근방법
  - Pointwise 접근방법
    - 한개의 문서에서,,, > softmax
  - Pairwise 접근방법
    - 한쌍의 문서( pair of documents) 에서 > constrative loss(Siamese)
    - 유명한 RankNet, LambdaRank, LambdaMART에서 적용됨.
  - Listwise 접근방법
    - 문서의 리스트에서,,,>
      - NDCG같은 IR measures을 optimized
      - 랭킹 고유의 속서을 이용하려는 loss
위에서 복잡하게 써 놨지만, hinge loss이다.

self.pos_scores = self.conv_model(self.query_emb,self.pos_doc_emb)
self.neg_scores = self.conv_model(self.query_emb, self.neg_doc_emb)
self.loss = tf.reduce_mean(tf.maximum(0.0, 1 - self.pos_scores + self.neg_scores))

Abstract

Introduction

ConvRankNet

Siamese CNN Encoder

Sentence Matrix

Convolution Feature Maps, Activation and Pooling

Similarity Mesure

RankNet

Experimental Results