- Concatenate the data/alics_data_* to one file.
- preprocess_al.py: preprocess alibaba dataset
- calw2c_tfidf, calw2s_tfidf: calculate the tfidf for sentences and conversation.
- data_loader.py: load preprocessed data to construct heterogeneous graph.
- train.py: train and evaluate model.
- predict.py: predict label with trained model.
- utils/config.py: configuration.
- utils/log.py: record the log.
- models/HiGraph.py: The heterogeneous graph model.
- Each line contains a dialogue sample.
- The columns are separated with '\001'
- The first column stands for a unique key of the dialogue
- The second column is the sequence of json ids in the dialogue
- The third column is the json list, in which the keys are respectively "id": id of json, related to the sequence, "text": the content of sentence, "member type": 1 for customer, 2 for customer service, 3 for automatic AI customer service
- The fourth and the last column stand for the category in coarse and fine level.