adamwynne/twitter-api

Response chunks don't map to streaming responses

Closed this issue · 11 comments

_EDITED TO ADD_*:
I don't think this is only confined to high-volume streams. Seeing the same issue with an extremely low-volume user stream :(

When following a high-volume stream, I'm seeing a lot of responses being chunked incorrectly. Here's a reproduction case:

(defn bodypart-print-2
  [response baos]
  (println (str "#### (" (.size baos) ") " (.toString baos))))

(def ^:dynamic
     *custom-streaming-callback*
     (SyncStreamingCallback. bodypart-print-2
                    (comp println response-return-everything)
                  exception-print))

  (statuses-filter :params {:track "bieber", :delimited "length"}
         :oauth-creds my-creds
         :callbacks *custom-streaming-callback*)

I turned on the delimited param to compare the length of the returned chunk to the expected length of the response. That seems like a route to fixing it but I'm reallll new to Clojure so I'm not totally sure where to start.

this is what i ended up doing, library handling this would be nicer:
https://gist.github.com/kelseyq/51dce2e997b393f469b5

can you clarify what you mean by chunked incorrectly? are you saying that
the broadcast length on the stream is not the size of the baos?

On Sun, Apr 27, 2014 at 11:30 PM, Kelsey Gilmore-Innis <
notifications@github.com> wrote:

When following a high-volume, stream, I'm seeing a lot of responses being
chunked incorrectly. Here's a reproduction case:

(defn bodypart-print-2
[response baos](println %28str "#### %28" %28.size baos) ") " (.toString baos))))

(def ^:dynamic
custom-streaming-callback
(SyncStreamingCallback. bodypart-print-2
(comp println response-return-everything)
exception-print))

(statuses-filter :params {:track "bieber", :delimited "length"}
:oauth-creds my-creds
:callbacks custom-streaming-callback)

I turned on the delimited paramhttps://dev.twitter.com/docs/streaming-apis/parameters#delimitedto compare the length of the returned chunk to the expected length of the
response. That seems like a route to fixing it but I'm reallll new to
Clojure so I'm not totally sure where to start.


Reply to this email directly or view it on GitHubhttps://github.com//issues/47
.

Hi Kelsey

Thanks for the info, but I'm still not quite connecting the dots how this
is unexpected behaviour and what the library should do about it?

Regards
Adam

On Mon, Apr 28, 2014 at 5:23 PM, Kelsey Gilmore-Innis <
notifications@github.com> wrote:

Tweet messages often span more than one baos chunk. This is noted by
Twitter in the streaming API documentationhttps://dev.twitter.com/docs/streaming-apis/processing#Transfer-Encoding:_chunked:
"Be aware that Tweets and other streamed messages will not necessarily fall
on HTTP chunk boundaries."


Reply to this email directly or view it on GitHubhttps://github.com//issues/47#issuecomment-41579575
.

It means you can't reliably do anything with the chunks returned; they aren't even valid JSON. I'd expect a library that only handles Twitter to be able to take a callback that processes individual tweets. I can't imagine a use case for the streaming API that wouldn't require some kind of workaround like the one I've pasted above; am I missing something?

At the very least, I'd hope it'd be mentioned in the documentation--I spent quite a while debugging this.

As for what to do about it, I'm not sure there's a FANTASTIC solution given that you have to keep some state around between chunks...something similar to what I did manually could be moved back into the library, though. As mentioned I'm really new to Clojure, but if you could point me to a starting point or a general idea I'd be willing to take a crack at it.

I haven't used the streaming endpoints in a while, but I think instead of using the delimited=length, it might be simpler to omit it, and then just keep concatenating until you break on \r\n's (which will give you valid JSON).

Also, I see in your code you're using .toString which will stop on a \0 termination char. This is dangerous as there are often null chars in tweets (unicode being one example)

I guess the idiomatic way to handle it would be a lazy sequence that populated tweets in an async callback parsing for \r\n's. I agree you'd have to keep some state, but that can be minimised (perhaps a 'buffer' or something).

This is higher level than everything else in the library, but if you want to have a crack at it, all efforts gratefully received.

Great, thank you for the advice! I also found this library, built on top of twitter-api:
https://github.com/mccraigmccraig/twitter-streaming-client
Do you think that'd be a better spot for it? (I tried it and it has the same issue.)

yeah, that seems perfect for it - it claims to solve these issues ;)

i will take my whining over there then! ;)

I've created a pull request with a fix in mccraigmccraig's library: mccraigmccraig/twitter-streaming-client#2

Looks good

On Saturday, June 21, 2014, Kelsey Gilmore-Innis notifications@github.com
wrote:

I've created a pull request with a fix in mccraigmccraig's library:
mccraigmccraig/twitter-streaming-client#2
mccraigmccraig/twitter-streaming-client#2


Reply to this email directly or view it on GitHub
#47 (comment)
.