fhamborg/NewsMTSC

TextTooLongException not thrown when text is too long in specific case

bendavidsteel opened this issue · 3 comments

When attempting to classify sentiment with the following input, I get a misleading exception. Generally TextTooLongException seems to be thrown appropriately, but this seemed to be the one exception to that:)

To reproduce:

left = "market eye-sgx nifty flat; bharti infratel debuts. * nifty futures on the singapore exchange unchanged. the msci-asia pacific index, excluding japan gains 0.43 percent. * bharti infratel ltd debuts after raising about $760 million in india's biggest ipo in two years. traders expect shares to come under pressure due to concerns about the outlook for mobile tower operators. * foreign investors sold 1.33 billion rupees of stocks, while domestic institutions bought 1.33 billion rupees of stocks on thursday, when india's bse index fell 0.48 percent. * traders expect market to trade in a narrow range, with select "
centre = "oil"
right = " marketing companies shares to be traced on talks of hike in diesel prices."
tsc = TargetSentimentClassifier()
sentiment = tsc.infer_from_text(left, centre, right)

I get the exception:

TargetNotFoundException
no target found: [market, eye, -, sgx, nifty, flat, ;, bharti, infratel, debuts, ., *, nifty, futures, on, the, singapore, exchange, unchanged, ., the, msci, -, asia, pacific, index, ,, excluding, japan, gains, 0.43, percent, ., *, bharti, infratel, ltd, debuts, after, raising, about, $, 760, million, in, india, 's, biggest, ipo, in, two, years, ., traders, expect, shares, to, come, under, pressure, due, to, concerns, about, the, outlook, for, mobile, tower, operators, ., *, foreign, investors, sold, 1.33, billion, rupees, of, stocks, ,, while, domestic, institutions, bought, 1.33, billion, rupees, of, stocks, on, thursday, ,, when, india, 's, bse, index, fell, 0.48, percent, ., *, traders, expect, market, to, trade, in, a, narrow, range, ,, with], 615, oil

With package version 1.1.21

Thanks, love the library!

When attempting to classify sentiment with the following input, I get a misleading exception. Generally TextTooLongException seems to be thrown appropriately, but this seemed to be the one exception to that:)

To reproduce:

left = "market eye-sgx nifty flat; bharti infratel debuts. * nifty futures on the singapore exchange unchanged. the msci-asia pacific index, excluding japan gains 0.43 percent. * bharti infratel ltd debuts after raising about $760 million in india's biggest ipo in two years. traders expect shares to come under pressure due to concerns about the outlook for mobile tower operators. * foreign investors sold 1.33 billion rupees of stocks, while domestic institutions bought 1.33 billion rupees of stocks on thursday, when india's bse index fell 0.48 percent. * traders expect market to trade in a narrow range, with select "
centre = "oil"
right = " marketing companies shares to be traced on talks of hike in diesel prices."
tsc = TargetSentimentClassifier()
sentiment = tsc.infer_from_text(left, centre, right)

I get the exception:

TargetNotFoundException
no target found: [market, eye, -, sgx, nifty, flat, ;, bharti, infratel, debuts, ., *, nifty, futures, on, the, singapore, exchange, unchanged, ., the, msci, -, asia, pacific, index, ,, excluding, japan, gains, 0.43, percent, ., *, bharti, infratel, ltd, debuts, after, raising, about, $, 760, million, in, india, 's, biggest, ipo, in, two, years, ., traders, expect, shares, to, come, under, pressure, due, to, concerns, about, the, outlook, for, mobile, tower, operators, ., *, foreign, investors, sold, 1.33, billion, rupees, of, stocks, ,, while, domestic, institutions, bought, 1.33, billion, rupees, of, stocks, on, thursday, ,, when, india, 's, bse, index, fell, 0.48, percent, ., *, traders, expect, market, to, trade, in, a, narrow, range, ,, with], 615, oil

With package version 1.1.21

Thanks, love the library!

I think that you should put all in one string. See the examples provided here: https://pypi.org/project/NewsSentiment/

When attempting to classify sentiment with the following input, I get a misleading exception. Generally TextTooLongException seems to be thrown appropriately, but this seemed to be the one exception to that:)
To reproduce:

left = "market eye-sgx nifty flat; bharti infratel debuts. * nifty futures on the singapore exchange unchanged. the msci-asia pacific index, excluding japan gains 0.43 percent. * bharti infratel ltd debuts after raising about $760 million in india's biggest ipo in two years. traders expect shares to come under pressure due to concerns about the outlook for mobile tower operators. * foreign investors sold 1.33 billion rupees of stocks, while domestic institutions bought 1.33 billion rupees of stocks on thursday, when india's bse index fell 0.48 percent. * traders expect market to trade in a narrow range, with select "
centre = "oil"
right = " marketing companies shares to be traced on talks of hike in diesel prices."
tsc = TargetSentimentClassifier()
sentiment = tsc.infer_from_text(left, centre, right)

I get the exception:

TargetNotFoundException
no target found: [market, eye, -, sgx, nifty, flat, ;, bharti, infratel, debuts, ., *, nifty, futures, on, the, singapore, exchange, unchanged, ., the, msci, -, asia, pacific, index, ,, excluding, japan, gains, 0.43, percent, ., *, bharti, infratel, ltd, debuts, after, raising, about, $, 760, million, in, india, 's, biggest, ipo, in, two, years, ., traders, expect, shares, to, come, under, pressure, due, to, concerns, about, the, outlook, for, mobile, tower, operators, ., *, foreign, investors, sold, 1.33, billion, rupees, of, stocks, ,, while, domestic, institutions, bought, 1.33, billion, rupees, of, stocks, on, thursday, ,, when, india, 's, bse, index, fell, 0.48, percent, ., *, traders, expect, market, to, trade, in, a, narrow, range, ,, with], 615, oil

With package version 1.1.21
Thanks, love the library!

I think that you should put all in one string. See the examples provided here: https://pypi.org/project/NewsSentiment/

I have the same problem. What do you mean by "put all in one string"? The examples provided have 3 strings, and the function requires 3 inputs.

the root cause of this issue is that the text input is too for the LM, hence truncation is applied. given that in the example you provided the first text component (text_left) is pretty long, after truncation, the target (and also text_right) is not part of the sequence anymore. while one could still run sentiment classification on this (technically), it wouldnt make sense, as the target is missing after truncation. hence the exception