Increase information in the dataset description

Question

Increase information in the dataset description

Closed this issue 7 years ago · 3 comments

Could be useful add information about how access to the dataset using jupyter.

For example, we can add the following information:

path_dataset = "/daf/opendata/alsia_o_atti_d_di_d_concessione1_0"
df = (spark.read.format("parquet")
     .option("inferSchema", "true")
     .option("header", "true")
     .option("sep", "|")     
     .load(path_dataset)
)

giux78 commented 7 years ago

Done

Answer 1 · 2017-10-12T14:55:55.000Z

If you want read the impala table:

from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql("use opendata")
incidenti = hive_context.table('alsia_o_atti_d_di_d_concessione1_0')
incidenti

Answer 2 · 2017-10-12T15:00:27.000Z

Better:
spark.sql("SELECT * FROM opendata.alsia_o_atti_d_di_d_concessione1_0").show()