Increase information in the dataset description
Closed this issue · 3 comments
fabiana001 commented
Could be useful add information about how access to the dataset using jupyter.
For example, we can add the following information:
path_dataset = "/daf/opendata/alsia_o_atti_d_di_d_concessione1_0"
df = (spark.read.format("parquet")
.option("inferSchema", "true")
.option("header", "true")
.option("sep", "|")
.load(path_dataset)
)
fabiana001 commented
If you want read the impala table:
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql("use opendata")
incidenti = hive_context.table('alsia_o_atti_d_di_d_concessione1_0')
incidenti
fabiana001 commented
Better:
spark.sql("SELECT * FROM opendata.alsia_o_atti_d_di_d_concessione1_0").show()
giux78 commented
Done