influxdata/flux

How can a JOIN statement implement automatic null filling, similar to the functionality in SQL?

lzl82891314 opened this issue · 2 comments

I currently have a requirement to combine the results of two queries using a JOIN statement.

import "join"

left = from(bucket: "aaa") 
 |> range(start: 1970-01-01T00:00:00Z) 
 |> filter(fn: (r) => r["_measurement"] == "app") 
 |> filter(fn: (r) => r["_field"] == "value") 
 |> group(columns: ["_time", "_value", "_field"], mode: "except") 
 
right1 = from(bucket: "aaa") 
 |> range(start: 1970-01-01T00:00:00Z) 
 |> filter(fn: (r) => r["_measurement"] == "app") 
 |> filter(fn: (r) => r["_field"] == "Duration") 
 |> group(columns: ["_time", "_value", "_field"], mode: "except") 
 
join.time(method: "left", left: left, right: right1, as: (l, r) => ({l with Duration: r._value}))

However, I've encountered an issue where if either the left or right side is empty, the entire query returns an exception [error preparing right side of join: cannot join on an empty table].

In a relational database, using a left join statement would automatically fill the right side with null values instead of throwing an error. How can I achieve this functionality in Flux?

@lzl82891314 There is a related issue here: #5307

But looking at your query, I don't think you actually need to join. I think you can query both fields and use pivot() to pivot them into columns:

from(bucket: "aaa") 
    |> range(start: 0) 
    |> filter(fn: (r) => r._measurement == "app") 
    |> filter(fn: (r) => r._field == "value" or r._field == "Duration") 
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

This should give you what you're looking for, is way more performant than a join, and will gracefully handle null values.

@sanderson Thank you very much for your response. I have successfully resolved my issue using the pivot().