It is a repository using product rating reviews to create a huggingface dataset for text classification task.
-
Install the required packages
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Setup Huggingface CLI
pip install -U "huggingface_hub[cli]" huggingface-cli login # login by your WRITE token
- Setup
run.sh
config value and run the scriptbash run.sh
- The label is getting by rating value, which is from 1 to 5. But the label is from 0 to 4.
- If it is test dataset, the label is -1.
-
You can run the following scripts to show the distribution of the dataset and tokenized dataset.
-
Modify the
analyze.sh
file. -
Run the following command.
bash analyze.sh