Cant use guess_sample_buffer_bytes?
Closed this issue · 3 comments
kieaiaarh commented
- Embulk v0.8.31
Gemfile
source 'https://rubygems.org'
# for input json
gem 'embulk-parser-jsonpath', '~> 0.2.0'
I just tried this for guess but not worked..
exec:
guess_sample_buffer_bytes: 136192
in:
type: file
path_prefix: tmp/
out:
type: stdout
$ embulk guess -g jsonpath config.yml.liquid -o guess.yml
2017-09-06 23:15:36.677 +0900: Embulk v0.8.31
2017-09-06 23:16:00.086 +0900 [INFO] (0001:guess): Listing local files at directory 'tmp' filtering filename by prefix ''
2017-09-06 23:16:00.087 +0900 [INFO] (0001:guess): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-09-06 23:16:00.092 +0900 [INFO] (0001:guess): Loading files [tmp/test.json]
2017-09-06 23:16:00.099 +0900 [INFO] (0001:guess): Try to read 136,192 bytes from input source
2017-09-06 23:16:00.145 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/gzip from a load path
2017-09-06 23:16:00.157 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/bzip2 from a load path
2017-09-06 23:16:00.169 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/json from a load path
2017-09-06 23:16:00.174 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/csv from a load path
2017-09-06 23:16:00.229 +0900 [INFO] (0001:guess): Loaded plugin embulk-parser-jsonpath (0.2.0)
org.jruby.exceptions.RaiseException: (null) unexpected token at '{
"date": "2017-08-06",
"clicks": 0,
"ctr": 0,
"impressions": 2,
"keyword": "hogehoge",
'
at RUBY.load(/Users/kieaiaarh/.embulk/jruby/2.3.0/gems/multi_json-1.12.2/lib/multi_json.rb:124)
at RUBY.process_object(/Users/kieaiaarh/.embulk/jruby/2.3.0/gems/jsonpath-0.5.8/lib/jsonpath.rb:87)
at RUBY.enum_on(/Users/kieaiaarh/.embulk/jruby/2.3.0/gems/jsonpath-0.5.8/lib/jsonpath.rb:73)
at RUBY.on(/Users/kieaiaarh/.embulk/jruby/2.3.0/gems/jsonpath-0.5.8/lib/jsonpath.rb:65)
at RUBY.guess_text(/Users/kieaiaarh/.embulk/jruby/2.3.0/gems/embulk-parser-jsonpath-0.2.0/lib/embulk/guess/jsonpath.rb:12)
at RUBY.guess(uri:classloader:/embulk/guess_plugin.rb:78)
at RUBY.guess(uri:classloader:/embulk/guess_plugin.rb:24)
:
and I cut data , which is 32K because input json data is too large
$ ls -la tmp
-rw-r--r-- 1 kieaiaarh staff 32K 9 6 23:18 tmp/test.json`
and its worked!
embulk guess -g jsonpath config.yml.liquid -o guess.yml
2017-09-06 23:20:23.739 +0900: Embulk v0.8.31
2017-09-06 23:20:46.922 +0900 [INFO] (0001:guess): Listing local files at directory 'tmp' filtering filename by prefix ''
2017-09-06 23:20:46.923 +0900 [INFO] (0001:guess): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-09-06 23:20:46.928 +0900 [INFO] (0001:guess): Loading files [tmp/test.json]
2017-09-06 23:20:46.935 +0900 [INFO] (0001:guess): Try to read 136,192 bytes from input source
2017-09-06 23:20:46.981 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/gzip from a load path
2017-09-06 23:20:46.991 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/bzip2 from a load path
2017-09-06 23:20:47.003 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/json from a load path
2017-09-06 23:20:47.008 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/csv from a load path
2017-09-06 23:20:47.061 +0900 [INFO] (0001:guess): Loaded plugin embulk-parser-jsonpath (0.2.0)
exec: {guess_sample_buffer_bytes: 136192}
in:
type: file
path_prefix: tmp/
parser:
charset: UTF-8
newline: LF
type: jsonpath
delimiter: ','
quote: '"'
escape: '"'
trim_if_not_quoted: false
skip_header_lines: 2
allow_extra_columns: false
allow_optional_columns: false
columns:
- {name: date, type: timestamp, format: '%Y-%m-%d'}
- {name: clicks, type: long}
- {name: ctr, type: double}
- {name: impressions, type: long}
- {name: keyword, type: string}
- {name: position, type: double}
out: {type: stdout}
Created 'guess.yml' file.
but I cant understand ....
according to embulk/embulk#609
It seems that guess config(yml) can configure guess_sample_buffer_bytes...
Thanks ur help.
hiroyuki-sato commented
@kieaiaarh Thank you for reporting this issue.
I'm investigating this issue.
hiroyuki-sato commented
@kieaiaarh
It seems that the cause is embulk-core.
I reported it. embulk/embulk#788
I'll let you know when the issue fix.
hiroyuki-sato commented
embulk/embulk#788 fixed this issue.