DANCER (Part 1)
achen353 opened this issue · 3 comments
achen353 commented
DANCER (PART 1): Implement the ROUGE score calculation between "each sentence in the ground truth abstractive summary" and "each section". Map each sentence in the former to a section.
DUE: 11/20 Saturday 11:59pm
achen353 commented
This is not done yet. Need to tune the parameters (e.g. min/max number of sentences/tokens for to-be-summarize text, use Combination or Greedy labeling)
achen353 commented
我們 data 的預處理有幾個會影響 preprocessing result 的參數:
- [greedy vs combination]:兩種 TransformerSum 內建,依據不同論文的 labeling function
TransformerSum/src/convert_to_extractive.py
Line 509 in a13dce1
no_preprocess
是否設為 True:參數是 False 時,在處理 text 時不會因為過長或過短而被 discard,當參數是 True 時,會依照給的 argument 去做篩選
TransformerSum/src/convert_to_extractive.py
Line 539 in a13dce1
@andywang268 能幫我簡單試一下這些嗎(用 branch issue-17-test-params,結果放到 #23):
- 預設:
oracle_mode="greedy"
和no_preprocess=False/None
python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test
oracle_mode="greedy"
和no_preprocess=True
python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --no_preprocess
oracle_mode="combination"
和no_preprocess=False/None
python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination
oracle_mode="combination"
和no_preprocess=True
python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination --no_preprocess