Merging documents
Closed this issue · 4 comments
Do we have a block that merges two documents? I have two files, which (should) have identical number of bundles, and only a single zone in each bundle (but the problem could be generalized to multiple zones). I want to read both documents into one so that each bundle has two zones, one from original document 1 and the other from original document 2.
I cannot solve this by putting two Read blocks in one scenario (with different zone selectors) because only one reader per scenario is allowed.
As far as I know, we don't... I once also needed an AlignedReader
for Treex files, but I had no time to implement it (and no desire to delve into the depths of Treex::PML
).
If the two documents contain only a-trees, it is possible to export them to CoNLL format and then use Read::AlignedCoNLL
, e.g. with parameters en_selector1='!en1*.conll' en_selector2='!en2*.conll'
. The problem with merging two treex files is that they may contain duplicate node IDs, so the aligned treex reader (or rather block AddTreexFile
) would need to handle this (rename all IDs, including all links).
Thanks, Read::AlignedCoNLL
works for me (after adjusting it to handle the 2006 flavor of the format). Closing the issue.
Just FYI, I just stumbled upon the block Misc::AddZonesFromFile
, which is basically what we need... but I don't know if it handles duplicate IDs properly.