DANCER (Part 1)

DANCER (PART 1): Implement the ROUGE score calculation between "each sentence in the ground truth abstractive summary" and "each section". Map each sentence in the former to a section.

DUE: 11/20 Saturday 11:59pm

This is not done yet. Need to tune the parameters (e.g. min/max number of sentences/tokens for to-be-summarize text, use Combination or Greedy labeling)

我們 data 的預處理有幾個會影響 preprocessing result 的參數：

[greedy vs combination]：兩種 TransformerSum 內建，依據不同論文的 labeling function

TransformerSum/src/convert_to_extractive.py

Line 509 in a13dce1

if oracle_mode == "greedy":
no_preprocess 是否設為 True：參數是 False 時，在處理 text 時不會因為過長或過短而被 discard，當參數是 True 時，會依照給的 argument 去做篩選

TransformerSum/src/convert_to_extractive.py

Line 539 in a13dce1

else:

@andywang268 能幫我簡單試一下這些嗎（用 branch issue-17-test-params，結果放到 #23）：

預設：oracle_mode="greedy" 和 no_preprocess=False/None

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test

oracle_mode="greedy" 和 no_preprocess=True

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --no_preprocess

oracle_mode="combination" 和 no_preprocess=False/None

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination

oracle_mode="combination" 和 no_preprocess=True

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination --no_preprocess

Solved with #22 and #25