opensearch-project/opensearch-spark

[FEATURE]Add `fillnull` command to PPL

Closed this issue · 2 comments

Description:
We propose adding a fillnull command to OpenSearch's Piped Processing Language (PPL) to provide a convenient way to handle null or missing values in query results. This feature would be similar to the fillnull command in Splunk's SPL, enhancing PPL's data cleaning and preparation capabilities.

Proposed Functionality:

  1. The 'fillnull' command should allow users to replace null values with a specified value.
  2. It should support filling nulls for specific fields or all fields.
  3. The command should allow different fill values for different fields.
  4. It should support conditional filling based on other field values or expressions.

Example Usage:

... | fillnull value=0

This would replace all null values in all fields with 0.

... | fillnull value=N/A field1, field2

This would replace null values in field1 and field2 with "N/A".

... | fillnull field1=0 field2="Unknown" field3=false

This would fill null values in different fields with different values.

... | eval new_field = if(field1 == "category1", field2, null) | fillnull value=0 new_field

This example uses eval to create a new field (or overwrite an existing one) based on a condition, and then use fillnull to handle the null values

...
| eval field1 = if(field1 == "category1", field1, null), field2 = if(field2 == "category2", field2, null)
| fillnull field1=0 field2="Unknown"

This example uses multiple eval expressions to handle different conditions for multiple fields, followed by fillnull


implementation Considerations:

  1. Ensure compatibility with existing PPL commands and syntax
  2. Optimize performance for large datasets with many null values
  3. Provide clear documentation and examples for users
  4. Consider type-checking or type-conversion for filled values

[Catch All Triage - 1, 2, 3, 4]

@salyh @YANG-DB can we close the issue?