json example on drill site doesn't work
mbeckerle opened this issue · 4 comments
Using drill 1.21
The example on page https://drill.apache.org/docs/json-data-model/ of querying the citylots.json does not work:
SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/home/mbeckerle/Downloads/citylots.json.gz`;
Error: PARSE ERROR: Encountered "." at line 1, column 19.
SQL Query: SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/home/mbeckerle/Downloads/citylots.json.gz`
^
[Error Id: bfb6226d-273f-4a9f-a58b-81c9cd32c18c ] (state=,code=0)
In addition, I've tried numerous variants of this query to see if I can find out what is wrong, but I can't get anything to work.
This much works:
SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`;
+----------------------------------------------------------------------------------+
| feature |
+----------------------------------------------------------------------------------+
| {"type":"Feature","properties":{"MAPBLKLOT":"0001001","BLKLOT":"0001001","BLOCK_NUM":"0001","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.42200352825247,37.80848009696725,0.0],[-122.42207601332528,37.808835019815085,0.0],[-122.42110217434863,37.808803534992904,0.0],[-122.42106256906727,37.80860105681815,0.0],[-122.42200352825247,37.80848009696725,0.0]]]}} |
| {"type":"Feature","properties":{"MAPBLKLOT":"0002001","BLKLOT":"0002001","BLOCK_NUM":"0002","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.42082593937107,37.80863147414603,0.0],[-122.4208580496797,37.80879564136959,0.0],[-122.4198119587043,37.80876180971401,0.0],[-122.42082593937107,37.80863147414603,0.0]]]}} |
| {"type":"Feature","properties":{"MAPBLKLOT":"0004002","BLKLOT":"0004002","BLOCK_NUM":"0004","LOT_NUM":"002","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"},"geometry":{"type":"Polygon","coordinates":[[[-122.41570120460688,37.80832725267146,0.0],[-122.4157607435932,37.808630700240904,0.0],[-122.4137878913324,37.80856680131984,0.0],[-122.41570120460688,37.80832725267146,0.0]]]}} |
+----------------------------------------------------------------------------------+
But if I then try this in a sub-select to try to dig out the MAPBLKLOT....
select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
Error: PARSE ERROR: Encountered "." at line 1, column 9.
SQL Query: select t.properties.MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t
^
[Error Id: 49670771-e7d5-41c7-9ede-9fa42e3ff261 ] (state=,code=0)
I found this works, but I don't know why the "['properties']" notation is needed vs. just "."
select t.feature['properties']['MAPBLKLOT'] as MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
select t.feature['properties']['MAPBLKLOT'] as MAPBLKLOT from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
+-----------+
| MAPBLKLOT |
+-----------+
| 0001001 |
| 0002001 |
| 0004002 |
+-----------+
3 rows selected (0.128 seconds)
apache drill>
It appears 'properties' is a keyword. This works, note the backticks around properties
select t.feature.`properties` as property from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
select t.feature.`properties` as property from (SELECT flatten(features) as feature FROM dfs.`/tmp/citydata.json`) t;
+----------------------------------------------------------------------------------+
| property |
+----------------------------------------------------------------------------------+
| {"MAPBLKLOT":"0001001","BLKLOT":"0001001","BLOCK_NUM":"0001","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
| {"MAPBLKLOT":"0002001","BLKLOT":"0002001","BLOCK_NUM":"0002","LOT_NUM":"001","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
| {"MAPBLKLOT":"0004002","BLKLOT":"0004002","BLOCK_NUM":"0004","LOT_NUM":"002","FROM_ST":"0","TO_ST":"0","STREET":"UNKNOWN","ODD_EVEN":"E"} |
+----------------------------------------------------------------------------------+
3 rows selected (0.157 seconds)
apache drill (dfs.tmp)>
Sure enough 'PROPERTIES' is in the keywords list here: https://drill.apache.org/docs/reserved-keywords/
So the original JSON example query can be expressed as:
SELECT features[0].`properties`.MAPBLKLOT FROM `/tmp/citydata.json`;
Fixed in drill-site commit de30f12f4b37f73caac35660d0cae8f8d88f9b7a
@mbeckerle thanks for this!