italia/daf-dataportal

Increase information in the dataset description

Closed this issue · 3 comments

Could be useful add information about how access to the dataset using jupyter.

schermata 2017-10-12 alle 16 22 12

For example, we can add the following information:

path_dataset = "/daf/opendata/alsia_o_atti_d_di_d_concessione1_0"
df = (spark.read.format("parquet")
     .option("inferSchema", "true")
     .option("header", "true")
     .option("sep", "|")     
     .load(path_dataset)
)

If you want read the impala table:

from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql("use opendata")
incidenti = hive_context.table('alsia_o_atti_d_di_d_concessione1_0')
incidenti

Better:
spark.sql("SELECT * FROM opendata.alsia_o_atti_d_di_d_concessione1_0").show()

Done