ingydotnet/git-hub

Some "git hub org-repos" trigger json parse errors for unpaged output

andreas-kupries opened this issue · 8 comments

Running the command

   % git hub org-repos hpcloud|cat

results in

  Repos for 'hpcloud':
  , or } expected while parsing object/hash, at character offset 258051 (before ":"https://api.github...") at /usr/lib/git-core/git-hub.d/json.pl line 21.

Removing the |cat and paging through the output on the other hand succeeds.
The same dichotomy happens when using --raw

Indicates that the issue is in code common to both regular and raw output, and specific to output through a pipe. Or rather output not to a terminal. File redirection triggers it too.

Also important, yesterday the same command worked, although I use sort instead of cat in the pipe.

More datapoints:

  • The org cloudfoundry triggered a similar bug this morning (a json parse error, but not quite the same as above), however when testing again as part of writing the issue it is gone.
  • The org cloudfoundry-incubator did not trigger the issue at all, neither yesterday, nor today.

I couldn't reproduce this.
If you can reproduce it, can you post the contents of ~/.git-hub/cache/{out,err,head} so we can see what json comes back?

edit: when trying to reproduce, use the --no-cache option, otherwise ~/.git-hub/cache/out won't get updated

Back from a short vacation ...
I am unable to reproduce the issue myself at the moment.
If/when it happens again I will make certain to get and attach these files here.

Now that I know about these files I got to wonder a bit ...
Is git hub re-entrant ?
IOW am I allowed to run multiple git hub commands in parallel ?
Especially multiple git hub org-repos $org --raw ?

Because I am running about 3 of these in parallel at the moment, one per org I am looking at.
When I tried to reproduce I ran only one, to ensure that there are no multiple processes all writing into the same ~/.git-hub/cache/{out,err,head} files.

Interesting.
There is no file locking, so multiple processes write to {out,err,head}.
Also, if there are over 100 results, the API responds with a Link header, and git-hub fetches the URL for the next page, and the output is again written to the same files.
I can imagine that something goes wrong if one process is writing the file and a different is reading and parsing.

The easiest solution would be to create an output dir per process (id) which has to be removed afterwards. Let's wait for a comment from @ingydotnet

Ok. Just ran my code in 2 modes.
First sequential, completing one org before doing the next.
Then the orgs in parallel.

Sequential was ok. Parallel immediately produced the issue at hand.

Reducing the actual script gave a small script to demo this, inlined below. (Was unable to attach, had no write permission according to github) It could be reduced more, by removing comments, helper variables, function, ... The relevant code are the git hub org-repos in UP and that UP is run in parallel for each org named.

#/usr/bin/bash
##
# # ## ### ##### ######## ############# #######################

orgs=""
orgs="$orgs hpcloud"
orgs="$orgs cloudfoundry"
orgs="$orgs cloudfoundry-incubator"

# # ## ### ##### ######## ############# #######################

function UP ()
{
    org="$1"
    repos="$(git hub org-repos $org --no-cache --raw | sort)"
    echo $org = $repos
    # In the full script this then goes and clones/updates the repos
    # of the org to the local disk. IOW the full script performs local
    # backup of orgs of interest. The orgs are done in parallel fill
    # all the available bandwidth (cpu, net, disk) with useful work.
}

for o in $orgs
do
#   ( UP $o )
    ( UP $o ) &
done
exit

Now that I know this limitation I should be able to restructure my code to run only the org-repos commands sequentially, and then the actual backup in parallel. Because that doesn't ... Never mind ... I have a git hub clone for when the repo does not yet exist locally. That could hit me again with the issue, although that would be much rarer I guess. ... Although I should be able to replace that with a regular git clone. Just have to construct the url myself. Then this part should be safe too, again.

@andreas-kupries A possible fix is pushed to branch issue/188. Please test.

@ingydotnet I think this should be merged anyway.