The code for paper "JPAVE: A Generation and Classification-based Model for Joint Product Attribute Prediction and Value Extraction", which will appear in the proceedings of 2023 IEEE International Conference on Big Data.
- Python >= 3.6
- torch >= 0.4.1
- numpy >= 1.17.4
- transformers
- Please get the entire MEPAVE dataset here.
- use data.data_process.py to preprocess the MEPAVE dataset to obtain train.json, valid.json and test.json for model training and testing, and also to generate a "tagmaster.json" file which stores all the attributes and their corresponding values in the dataset.
- use data.generate_mepave_attribute_value_embeddings.py to generate pre-trained attribute and value embeddings by using pre-trained BERT model (we use the pre-trained "bert-base-chinese" from huggingface).
- move the generated "tagmaster.json", "mepave_attribute_embeddings.json" and "mepave_value_embeddings.json" to the root of this project.
Run the train.py file to train the model as follows:
python train.py