
Issue parsing very large JSON file

Altonymous opened this issue · 7 comments

I have a very large JSON file and it appears to be having problems parsing it.

Here's a gist of the file:

You'll need to make sure you view the raw. It's all on one line, not formatted pretty.

I've tried 3 different ruby parsers with no luck.

However, and python both are able to parse it.

Also of note it was python's JSON library that exported the file to being with.

I tried piping:

cat 2014-06-24.json | ruby -ryajl -e "puts Yajl::Parser.parse(STDIN).inspect"

I tried parsing it in irb:

require 'yajl'
file_path = "/temp/2014-06-24.json"
parser =
json =, 'r')
hash = parser.parse(json)

And the error I get is:

Yajl::ParseError: lexical error: invalid char in json text.
                                       {"2269": {"recommended_pps":
                     (right here) ------^

    from (irb):7:in `parse'
    from (irb):7
    from /Users/nunya/.rvm/rubies/ruby-2.1.0/bin/irb:11:in `<main>'

This works for me:

curl -L | ruby -ryajl -e "puts Yajl::Parser.parse(STDIN).inspect"

that's with yajl-ruby 1.2.1 on Mac.

What ruby version are you using?

I'm using 2.1.0-p0

I just tried it via the curl method and it worked for me as well. So I'm now wondering if there is a hidden character in my actual file that didn't transfer over.

i tried 'export LC_ALL=C' to turn off UTF-8 and it still works for me, so there's no UTF-8 in what you pasted. you might have a UTF-8 char in the original source and were using a non-UTF-8 locale and it blew up because of that.

also tried on a different variety of 1.9.3 and 2.0.0 that i have kicking around in rvm and can't recreate...

Closing this issue out. I downloaded the file again and it worked. So I'm not sure what's going on. Seems very odd. I had someone pairing with me when it kept blowing up over and over. So not sure what the deal is. I copy-pasted from the file that wouldn't parse to create the gist.

Unfortunately, I overwrote the file when downloading it again so I don't have the file that was causing issues.

I appreciate you taking a look. If it happens again I'll save the file and figure out a way to host it so others can use the file directly.