datasets annotation
Opened this issue · 1 comments
Hello,
Is there any information how did you annotate those datases?
from which patent databases did you extract them?
Is full text for each document available or only title?
Thank you
I have the same question.
How was this data set curated? There are many instances of the titles being cut short, not making sense on their own, or just not seeming like titles at all... E.g.: 'Can be used for such as quantum computing which is used for solving the problem that the system and method for'.
Furthermore, many of the ground truth 'positive' and 'negative' labels assigned by the expert writing the paper seem to not make sense. For example, 'A modular array of vertically integrated superconducting qubit - units for scalable quanta data processing' is classed as 'negative' even though it seems highly relevant to hardware quantum qubits, while 'SOLID STATE MATERIAL' and 'SINGLE CRYSTAL CVD DIAMOND AND DEVICES' were classed as 'positive' even though they seemingly have nothing to do with quantum qubit generation.