Add performance data source
Opened this issue · 0 comments
EnricoMi commented
Add a data source that does not read the actual data but provides performance metrics. Each partition sends a query to the Dgraph cluster and retrieved besides the data also these metrics:
"extensions": {
"server_latency": {
"parsing_ns": 78501,
"processing_ns": 881611,
"encoding_ns": 110785,
"total_ns": 1145597
},
"txn": {
"start_ts": 10007
},
"metrics": {
"num_uids": {
"dgraph.graphql.schema": 10,
"dgraph.type": 10,
"director": 10,
"name": 10,
"release_date": 10,
"revenue": 10,
"running_time": 10,
"starring": 10,
"uid": 16
}
}
}
The performance data source can encode these information (together with information from TaskContext
and the individual partitions) rather than the actual data result into the DataFrame
. This provides benchmarking tools to measure per-partition timings and cardinality information and write them via Spark to disk.