fennb/phirehose

fputs bug in ghetto collection?

Closed this issue · 7 comments

I ran into an issue yesterday where I could no longer process the txt files from ghetto-queue-collect with ghetto-queue-consume. In my case it looks like for whatever reason each status tweet was being appended to the file without adding a new line. Subsequently when trying to process, line 91 of the consume script (while ($rawStatus = fgets($fp, 8192))) was pulling data chunks that contained multiple status tweets which resulted in incomplete json arrays causing the script to break.

The short term fix I made was to change the collector to add PHP_EOL to each status save(line 70):
fputs($this->getStream(), $status . PHP_EOL);

What I'm wondering is if anyone else has had this problem and second, would this be better to move into Phirehose.php where enqueueStatus is called?

I encountered the very same problem. This happened when I upgraded the Phirehose library yesterday. I did this because, I was getting a 403 Forbidden exception from Twitter.

Thank you for the fix, it works.

I didn't work with ghetto collecotion to .txt, but will test this and see if we can put it at enqueue :thumbs:

@mithunb, I had a similar change. I thought it might be my code and additions so I tried it with a fresh pull from the repo and the examples were having the same issues.

Many thanks to @DarrenCook for pointing me here; sorry I didn't see this thread before posting #55.

I wound up arriving at the same solution. Do we think this is the "right" way to do it? Or edit lib/Phirehose.php to stop stripping out the line breaks?

Many thanks to @DarrenCook for pointing me here; sorry I didn't see
this thread before posting #55.

I wound up arriving at the same solution. Do we think this is the
"right" way to do it? Or edit lib/Phirehose.php to stop stripping out
the line breaks?

They are not being edited out deliberately, so it must be a regression I
introduced (in such a way that it didn't affect me or the other early
testers)! A couple of blocks of core code were completely replaced, so
it would be different to a simple typo forgetting a "\n" somewhere.
(E.g. I stopped using "delimted:length", so the format of the data being
received from Twitter is actually different now.)

The other main bug being reported (#51) is that of getting two LFs in a
row when the feed goes quiet. This has already been patched.

If you can add an LF in, without breaking that patch, it might be the
best solution?

Darren

Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

From memory, the old/original Phirehose code used to presume that statuses were newline delimited, hence ghetto-collect was able to write them raw to the file (to subsequently be consumed using fgets()).

It seems reasonable that raw statuses don't have newlines at the end (it's certainly not required) but we kind of need to decide if that's the case or not. To me, it seems reasonable that $status has no newline, and if you want it to be newline separated in the file, it's added at write time (as suggested above).

fennb commented

Fixed in 197b7da