declare-lab/RelationPrompt

Creating own data splits

jbrry opened this issue · 1 comments

jbrry commented

Hi, thanks for the great resource!

I would like to try RelationPrompt on Wiki-ZSL and FewRel with different sizes of m.

In order to generate the new train/dev/test files, we can run the write_data_splits function, which calls the
load_fewrel and load_wiki methods.

I have a few questions, is it possible to share what the contents of the file data/wiki_properties.csv should be or how to generate this file? Secondly, for the path_in parameter, is it safe to assume you used the FewRel and WikiZSL files linked in the ZSBERT README?

jbrry commented

I found the resource property_list.html in the ZS-BERT repo. You can convert it to data/wiki_properties.csvusing the code below:

import pandas as pd

dataframes = pd.read_html('property_list.html') # download from ZS-BERT
df = dataframes[0]

column_names = list(df.columns)

# rename columns to column headers used in RelationPrompt
# p: str
# pType: str
# pLabel: str
# pDescription: str
# pAltLabel: str

dfr = df.rename(columns={
    "ID": "p",
    "label": "pLabel",
    "description": "pDescription",
    "aliases": "pAltLabel",
    "Data type": "pType"
    })

dfr.to_csv("data/wiki_properties.csv", index=False)