How to use the trained model

Question

How to use the trained model

Opened this issue 4 years ago · 21 comments

Hi!

I was wondering if I can use the model, after training it, on new stuff to do the aspect extraction and also to find aspect sentiment polarity?

If yes how can I do it?

Thanks!

Answer 1 · 2020-12-03T01:17:06.000Z

This feature is mentioned here, please check it out.

Answer 2 · 2020-12-03T10:11:41.000Z

Thanks for the reply! In which format the dataset should be in order for the model to work fine?

Answer 3 · 2020-12-03T12:12:32.000Z

Same to the given data files. For example, you can follow the format of rest16.train and randomly assign the tag for each input word (just to facilitate the input and these tags are not used during inference).

Answer 4 · 2020-12-03T15:35:58.000Z

When I create something like that it says ValueError: not enough values to unpack (expected 2, got 1)
test.txt

Answer 5 · 2020-12-11T12:45:06.000Z

Can you show me the log?

Answer 6 · 2020-12-22T08:10:24.000Z

Hello sorry for the late reply,

here is my text document and also my log file:

test.txt

output.txt

my work.sh looks like this after I created a folder named test inside /data and in this test folder I put the test.txt file.

#!/usr/bin/env bash TASK_NAME="test" ABSA_HOME="./bert-linear-laptop14-finetune" CUDA_VISIBLE_DEVICES=0 python work.py --absa_home ${ABSA_HOME} \ --ckpt ${ABSA_HOME}/checkpoint-1400 \ --model_type bert \ --data_dir ./data/${TASK_NAME} \ --task_name ${TASK_NAME} \ --model_name_or_path bert-base-uncased \ --cache_dir ./cache \ --max_seq_length 128 \ --tagging_schema BIEOS

Answer 7 · 2020-12-24T09:40:39.000Z

Same to the given data files. For example, you can follow the format of rest16.train and randomly assign the tag for each input word (just to facilitate the input and these tags are not used during inference).

Sorry for not saying this clearly. You should randomly assign a valid tag, which comes from the valid tag set {T-POS, T-NEG, T-NEU, O}, rather than 0 (in your case) to each word.

Answer 8 · 2020-12-24T11:39:59.000Z

Thank you for the reply! I changed the tags and now I am getting AssertionError in work.py Line 152:

assert len(words) == len(pred_labels)

should I remove the stop words from the tagging part?

Answer 9 · 2020-12-24T11:54:02.000Z

Well, you may need to print them out and see what happened.

Answer 10 · 2020-12-26T11:37:00.000Z

I printed them out and when I print the words it actually prints the words I add after #### and pred_labels is just a list of 17 zeros which I couldn't link to something? The sentence has 10 words in total.

10 is not equal to 17 so that's why there is an AssertionError but I don't understand what is 17 in this case.

Answer 11 · 2021-01-05T02:01:12.000Z

The length of words should be equal to the length of pred_labels. In your case, 10 is the length of words and 17 should be the length of sub-words. Normally, the sub-word sequence is longer than the original word sequence because sub-word is a more fine-grained semantic unit.

Is there anything wrong with your input?

Another way to check if the code is problematic is to apply it to the data files provided in this repo, e.g., laptop14/test.txt.

Answer 12 · 2021-01-05T06:40:22.000Z

I checked with other data files and the problem seems to be this:

The code expects 800 reviews because at the Evaluation process it says 0/800 at the end and when I only type for example 10 reviews the AssertionError occurs at 10/800.

Normally when I use laptop14/test.txt everything runs smooth but when I delete some reviews from test.txt I encounter an error again. So that might be the problem. Maybe the code expects a certain amount of reviews to go through.

output2.txt

Answer 13 · 2021-01-05T08:01:01.000Z

I don't think the error come from the input data.

Have you ever checked the values of "idx" and "total words"?

Answer 14 · 2021-01-05T11:52:50.000Z

yes for example when I add 10 reviews to test.txt "idx" only goes up to 9 then there is this IndexError and total words is always the length of the review.

Answer 15 · 2021-01-05T13:43:17.000Z

When I type in this text file myself, even if I write in the same review in "test.txt", I get AssertionError this is really interesting what might cause this?

Answer 16 · 2021-01-06T01:07:55.000Z

OK, I will check this issue, please stay tuned.

Answer 17 · 2021-01-06T03:19:12.000Z

Hello, I train a model on laptop14 and do the inference on test20/test.txt (the running scripts fast_run.py and work.sh are also updated accordingly). Everything works normally.

Here is the output log: output.log.

I have no idea about your issues. I suggest you re-clone this repo, upgrade transformers to 4.1.1, and see what will happen.

Answer 18 · 2021-01-07T12:41:34.000Z

When I add a sentence myself to the test20/test.txt it doesn't recognize it and just applies the model on 20 sentences. When I delete a sentence I get an error.
This might be an absurd question but is there a special way to add or delete a sentence from test20/test.txt ?

Answer 19 · 2021-01-07T13:00:10.000Z

This is NOT an absurd question. PLEASE read the code carefully.

The program will cache the feature file for the first time you perform inference. If you want to change the content of test20/test.txt, you should delete the cached feature file (at ./data/test20/) as well, otherwise, the previous features will be loaded as the model input in the following runs.

Answer 20 · 2021-01-07T13:25:22.000Z

This was the problem the whole time.... Thanks for all the help!!!!!!!!!!!!

Answer 21 · 2021-01-08T10:44:01.000Z

Is there a way to disable this tagging part of the input text file? Is there a way to modify the code so that the input data doesnt require the part after '####' ? (For the interference part)